The dataset originates from real SCADA telemetry recorded across multiple offshore oil wells operated by a major production company. Data was captured at one-minute intervals across 24 sensor channels, spanning normal operations and nine confirmed fault categories. Well identifiers are fully anonymised.
The dataset contains real, simulated, and drawn instances. Only real instances were used for primary model validation, ensuring no synthetic contamination of evaluation metrics.
CompositionThe majority accumulate silently without triggering any existing alarm. Undetected faults are not the exception — they are the operational baseline of an unmonitored production system.
Distribution by observation countExtreme imbalance — up to 1:500 between rarest and most frequent fault class. Addressed via balanced evaluation sets at validation and test. Training used normal data only for the unsupervised detection layer.
An operating regime is the unique combination of valve states and choke settings at a given moment — defining which flowlines are active, whether gas lift is running, and how the choke is set.
Physical sensor ranges differ significantly across regimes. A pressure reading normal under gas-lift production may be anomalous during crossover. All normalisation is regime-conditional — ensuring anomaly detection is relative to current operating mode.
ESTADO-* variables map to integers (0 = closed, 1 = open). ABER-* apertures are discretised to binary or ternary. The concatenated string uniquely identifies the regime. Sparse regimes (<50k samples) are consolidated into a General Regime bucket, yielding approximately 11 distinct regimes.
| # | Fault Type | Proportion | % Time | Sev. |
|---|---|---|---|---|
| 0 | Normal Operation12,158,183 rows | 37.0% | — | |
| 7 | Scaling in PCK7,864,945 rows | 23.9% | High | |
| 8 | Hydrate — Production Line4,809,035 rows | 14.6% | Crit | |
| 4 | Flow Instability3,689,683 rows | 11.2% | Med | |
| 9 | Hydrate — Service Line2,635,372 rows | 8.0% | High | |
| 3 | Severe Slugging684,352 rows | 2.1% | Crit | |
| 5 | Rapid Productivity Loss439,408 rows | 1.3% | High | |
| 2 | Spurious DHSV Closure277,001 rows | 0.8% | Crit | |
| 1 | Abrupt BSW Increase236,794 rows | 0.7% | Med | |
| 6 | Quick PCK Restriction77,477 rows | 0.2% | Med |
The system separates anomaly detection from fault annotation by design. Detection (Layers 1–2) requires no labelled fault data and can be deployed immediately with any operator. Annotation (Layer 3) is an optional enrichment layer that improves progressively as labelled fault history accumulates.
| Fault Type | % Time | Primary Loss Mechanism | Est. Value | CH₄ | Sev. |
|---|---|---|---|---|---|
| Scaling in PCK | 23.93% | Gradual throughput loss via progressive choke narrowing | $4.2M | Low | High |
| Hydrate — Production Line | 14.63% | Full blockage; production loss + CH₄ blowdown required | $2.1M + 82t | Crit | Crit |
| Flow Instability | 11.22% | Efficiency degradation and accelerated equipment wear | $0.9M | Low | Med |
| Hydrate — Service Line | 8.02% | Indirect impact: injection and support operations disrupted | $0.4M + 38t | High | High |
| Spurious DHSV Closure | 0.84% | Unplanned shut-in; controlled depressurisation required | $0.3M + 26t | Crit | Crit |
Estimates based on observed fault proportions and published offshore production loss rates at $75/bbl. Figures represent order-of-magnitude illustration, not contractual projections.