Opponent State Inference Under Partial Observability: An HMM-POMDP Framework for 2026 Formula 1 Energy Strategy

from arxiv, 17 pages. Pre-registered theoretical framework; empirical calibration on 2026 race telemetry begins Australian Grand Prix, 8 March 2026. Paper 1 of 3. ResearchGate preprint: DOI 10.13140/RG.2.2.16034.08644

The 2026 Formula 1 technical regulations introduce a fundamental change to energy strategy: under a 50/50 internal combustion engine / battery power split with unlimited regeneration and a driver-controlled Override Mode (abbreviated MOM throughout), the optimal energy deployment policy depends not only on a driver's own state but on the hidden state of rival cars. This creates a Partially Observable Stochastic Game that cannot be solved by single-agent optimisation methods. We present a tractable two-layer inference and decision framework. The first layer is a 30-state Hidden Markov Model (HMM) that infers a probability distribution over each rival's ERS charge level, Override Mode status, and tyre degradation state from five publicly observable telemetry signals. The second layer is a Deep Q-Network (DQN) policy that takes the HMM belief state as input and selects between energy deployment strategies. We formally characterise the counter-harvest trap -- a deceptive strategy in which a car deliberately suppresses observable deployment signals to induce a rival into a failed attack -- and show that detecting it requires belief-state inference rather than reactive threshold rules. On synthetic races generated from the model's own assumptions, the HMM achieves 92.3% ERS inference accuracy (random baseline: 33.3%) and detects counter-harvest trap conditions with 95.7% recall. Pre-registration -- empirical validation begins Australian Grand Prix, 8 March 2026.

翻译：2026年一级方程式赛车技术规则对能量策略带来根本性变革：在50/50内燃机/电池功率分配、无限制能量回收及车手控制的超越模式（全文简称MOM）规则下，最优能量部署策略不仅取决于车手自身状态，更依赖于对手赛车的隐藏状态。这构成了一个无法通过单智能体优化方法求解的部分可观测随机博弈。本文提出一个可处理的双层推断与决策框架：第一层采用30状态隐马尔可夫模型，通过五个公开可观测的遥测信号推断每个对手的ERS充电水平、超越模式状态及轮胎磨损状态的概率分布；第二层为深度Q网络策略，以HMM信念状态为输入，在不同能量部署策略间进行选择。我们正式刻画了"反收割陷阱"——一种通过刻意抑制可观测部署信号诱导对手实施失败攻击的欺骗性策略，并证明其检测需要信念状态推断而非反应式阈值规则。在基于模型自身假设生成的模拟比赛中，HMM实现92.3%的ERS推断准确率（随机基线：33.3%），并以95.7%召回率检测反收割陷阱条件。预注册实证验证将于2026年3月8日澳大利亚大奖赛启动。