Geosteering requires navigating a well trajectory through an unknown geological configuration, while sequentially updating decisions based on indirect measurements acquired during drilling. This work presents an uncertainty-aware geosteering framework that tightly integrates particle filtering for probabilistic subsurface interpretation with value-based reinforcement learning for sequential decision-making. Geological uncertainty ahead of the drill bit is represented explicitly through a particle filter (PF), enabling belief-informed control rather than deterministic trajectory correction. The framework couples PF belief updates with belief-informed decision policies and evaluates three decision-making options that operate under identical uncertainty representations: an interpretable Approximate Dynamic Programming (ADP) scheme, a Deep Q-learning baseline, and a Dual Deep Reinforcement Learning (Dual DRL) architecture trained with a target Q-network scheme for stability, using a dueling (value/advantage) decomposition for Q-value parameterization. Beyond final placement performance, we assess policy behavior using stability-oriented metrics that quantify steering smoothness over time, providing additional operational insight into how decision policies respond as uncertainty evolves. The framework is integrated with an API for validation within an industrial geosteering simulator under realistic measurement noise and drilling constraints. Using identical geological realizations, operational limits, and reward definitions across methods, the experiments provide a controlled and high-fidelity evaluation of how alternative decision policies behave throughout the drilling process, rather than evaluating performance solely from the final well trajectory.
翻译:地质导向需要在未知地质构型中导航井眼轨迹,同时根据钻井过程中获取的间接测量数据序列更新决策。本文提出一种考虑不确定性的地质导向框架,该框架将用于概率性地下解释的粒子滤波与基于价值的强化学习序列决策方法紧密耦合。通过粒子滤波器显式表征钻头前方的地质不确定性,实现基于信念的控制而非确定性轨迹校正。该框架将粒子滤波信念更新与基于信念的决策策略相结合,评估了在相同不确定性表示下运行的三种决策方案:可解释的近似动态规划方案、深度Q学习基线模型,以及采用目标Q网络方案保障稳定性的双深度强化学习架构(该架构使用决斗(价值/优势)分解进行Q值参数化)。除最终入靶性能外,本文采用量化导向平滑度的稳定性指标评估决策策略行为,从而提供关于决策策略如何随不确定性演化而响应的额外运行洞察。该框架通过API集成到工业地质导向模拟器中,在真实测量噪声和钻井约束条件下进行验证。通过跨方法采用相同的地质实现、操作约束和奖励定义,实验对钻井过程中不同决策策略的完整行为进行了受控高保真评估,而非仅依据最终井眼轨迹评判性能。