energyOS Technical Whitepaper
An overview of the infrastructure that powers cross-commodity correlation insight and market transparency. energyOS is a quantitative machine learning intelligence system designed to generate effective insight across energy relationships, data sources, and market conditions — producing actionable output for natural gas, heating oil, electricity, and adjacent commodity markets.
Model infrastructure: Neurosymbolic and ensemble architecture for cross-commodity correlation. Built with the intent to organize, empower, and increase the efficiency, accuracy, and transparency of energy markets.
Abstract
energyOS combines expansive multi-domain datasets with internal deep machine learning methods — including neural networks and autoencoding — alongside supervised machine learning and a large-language model frontend. Together these layers interpret noise, surface signals, and clean analytical output into a form that is immediately useful for market practitioners.
Introduction
Energy markets are fundamentally unpredictable in isolation. The structure of market orientation is a combination of economics, physics, national and global events — a combination that is not only difficult to understand but hard to predict. Energy types carry specialized characteristics and interconnected relationships that exist across natural gas, refined goods, electricity, and more. Infrastructure differences, subcategory dynamics, and market microstructure multiply the complexity further.
The challenge is not just quantity — it is the pace of change. Relationships in energy markets do not hold static; they evolve. The transition of ERCOT from Real-Time Dispatch (RTD) to Real-Time Co-Optimization (RTC) in December 2025 is a concrete example: a single ISO infrastructure change fundamentally altered how scarcity pricing, battery dispatch, and qualified scheduling entity economics interact. Any system without adaptive capabilities would have degraded immediately.
An isolated system without adaptive capabilities does not last long in the energy domain. The pure volume of domain-specific changes makes consistent model performance under a static architecture near impossible. energyOS is designed around this reality from the ground up.
The energyOS architecture addresses this through adaptive machine learning, continuous retraining, regime detection, and a multi-source data infrastructure that spans physical, financial, satellite, and news domains — with the explicit goal of building a system that adapts to market evolution rather than assuming market stationarity.
Data infrastructure
A cross-commodity system that can understand, capitalize on, and predict market dynamics requires expansive quantities of quality data with consistent updating, implementation into model infrastructure, and systematic output. The data pipeline is built around three principles: breadth, autonomy, and differentiation.
Data domains
The platform's satellite integration tracks radiation changes at specific oil refinery coordinate locations. This provides upstream insight into production changes — and by extension, refined product supply — before these shifts appear in conventional market reporting. Radiation quantities correlate with production volumes of heating oil, gasoline, and other refined products.
Autonomous integration
Data ingestion is batched at intervals that correspond to each source's publication frequency. The autonomy of this implementation is one of the most valuable pieces of infrastructure in the system. AWS functions handle the full pipeline: implementation, transformation, and storage — enabling model training, learning, adaptation, and prediction without manual data operations.
Machine learning architecture
Interpreting data whilst understanding gradual or rapid pattern shifts requires several complementary machine learning methods. The architecture runs three parallel model layers, each designed to capture a different dimension of market behavior.
Model layers
Model components
Training standardisation
Each model bundle is saved with a corresponding StandardScaler, feature list, performance metadata, and input dimensions — ensuring identical preprocessing between training and inference with no feature alignment errors and minimal drift. The StandardScaler prevents raw numerical magnitude from dominating feature importance: without it, a temperature value of 70 would numerically outweigh a spread value of 0.4, despite the spread potentially carrying greater analytical significance.
All major deep learning models are retrained or recalibrated on a weekly basis to account for data drift and fast market adaptation — what the system terms "data currents." This cadence enables adaptive pattern recognition as market regimes and structural relationships shift.
LLM intelligence layer
The large-language model layer provides transparency into data interpretation and synthesizes outputs from the quantitative model stack. Each user query triggers a parallel injection of machine learning output data — including trend context, market prices, storage data, satellite imagery signals, news overviews, electricity anomalies, and all model outputs — into the LLM context window.
Primary functions
| Function | Description |
|---|---|
| Noise reduction | Filters conflicting or low-conviction signals before surfacing them to the analyst. The LLM layer handles the contexts that pure machine learning methods cannot define — including geopolitical interpretation and qualitative market narrative. |
| Output cleaning | Translates raw quantitative model outputs — scores, z-scores, reconstruction errors, uncertainty bands — into plain-language market insight without sacrificing the underlying precision. |
| Signal conflict resolution | When data values overlap or contrast each other, the LLM layer aggregates and interprets them rather than returning conflicting outputs. High-context interpretation of ambiguous signal combinations is a core function. |
| Grounded reasoning | All LLM responses are grounded in the current signal context. The system explicitly refuses to speculate beyond available data, stating when a signal is unavailable or stale rather than generating low-confidence answers. |
Internal prompt design is a critical component of the LLM layer. The prompt architecture organises all model output data and provides sufficient data context so the model can interpret values correctly — including the relationship between different model outputs and the conditions under which specific signals should take precedence.
Market efficiency applications
The combination of multi-domain machine learning methods and an autonomous intelligence pipeline enables a broad set of efficiency improvements across the energy market stack — from grid operations to trading desk analytics.
Risk considerations
As with any intelligence system operating in a high-stakes domain, energyOS carries a defined set of risk factors that practitioners should understand and account for in their workflows.
| Risk category | Description | Mitigation |
|---|---|---|
| LLM hallucination | The LLM layer may generate plausible but unsupported inferences when signal context is incomplete or ambiguous. | The system is designed to state signal unavailability explicitly rather than speculate. All responses cite the signals used. |
| Prompt poisoning | Adversarial inputs designed to manipulate LLM output or override system prompt constraints. | Internal prompt architecture is hardened. System should not be sole source for execution decisions. |
| Adversarial noise | Deliberately misleading signals in data sources — e.g. coordinated misinformation in news feeds. | Multi-source data architecture limits the impact of any single compromised feed on model output. |
| Data latency | Lag between real-world events and data availability within the pipeline can create signal staleness. | Freshness timestamps are visible on all signals. The system explicitly flags stale data rather than presenting it as current. |
| Liquidity risk | Model outputs are based on fundamental relationships; execution-level liquidity in target instruments is not assessed. | energyOS is a primary resource for validation and verification — not the sole basis for trade execution. |
This system should not be the sole source regarding execution of trades or any operations. It is designed to act as a primary resource for the validation and verification of interpreted results — not a replacement for practitioner judgment, risk management infrastructure, or regulatory compliance processes.
Benchmark comparisons
The following comparisons were conducted on April 8, 2026 at 13:00 UTC against Claude Sonnet 4.6 and ChatGPT 5.3, using standardised prompts targeting different analytical dimensions. The results illustrate the differentiated strengths of a domain-specific intelligence system versus general-purpose LLMs.
All prompts were tested simultaneously. Responses were limited to 2–5 sentences per the prompt specification. URL citations were removed from displayed outputs for formatting. No post-processing was applied.
Prompt 1 — General market state
Tell me what is happening in the energy market (3–5 sentences)
Prompt 2 — Historical divergence analysis
Review the historical 24-month price divergence between ERCOT and PJM (2–3 sentences)
Prompt 3 — ERCOT scarcity pricing forecast
From ERCOT ISO management perspective, are they likely to implement scarcity pricing and use natural gas peakers within the next week? (2–3 sentences)
Prompt 4 — Non-linear WTI / Henry Hub correlation
What is the non-linear correlation between WTI crude and Henry Hub natural gas prices? (3–5 sentences)
General-purpose language models are trained to synthesize the world's published knowledge — they excel at historical context, academic framing, and broad market narrative. What they cannot do is tell you that ERCOT's P(stress) is 0% across a 7-day horizon, that the HO/NG BTU ratio is sitting at 10.92x against a 1.1x fuel-switching threshold, or that NG storage at 3,117 Bcf is absorbing crude shock transmission in real time. That gap — between contextual awareness and real quantitative precision — is exactly where energyOS operates.
The platform is built around the physics of energy markets, not just their price history. Fair value is estimated from storage deviation, satellite-scored refinery utilization, heating degree day composites, and curve structure — not from pattern-matching on published research. Regime detection is driven by autoencoder reconstruction error against a baseline of normal market conditions, not by keyword recognition in news headlines. When energyOS flags a regime shift or returns a conviction score, it is doing so from a grounded physical and statistical model of the market — not from a language model's probabilistic interpolation of what sounds right.
For trading desks and grid operators, the distinction is not academic. A generalist LLM will tell you that April is a shoulder season and scarcity is unlikely. energyOS will return a stress probability of 0%, a burn score of 1.1/100, and a forward HDD composite that declines from 18.8 to 14 over the next five days — and flag precisely which physical preconditions are absent. One gives you context. The other gives you a defensible, quantified position.