Decision-making tasks often unfold on graphs with spatial-temporal dynamics. Black-box reinforcement learning often overlooks how local changes spread through network structure, limiting sample efficiency and interpretability. We present GTL-CIRL, a closed-loop framework that simultaneously learns policies and mines Causal Graph Temporal Logic (Causal GTL) specifications. The method shapes rewards with robustness, collects counterexamples when effects fail, and uses Gaussian Process (GP) driven Bayesian optimization to refine parameterized cause templates. The GP models capture spatial and temporal correlations in the system dynamics, enabling efficient exploration of complex parameter spaces. Case studies in gene and power networks show faster learning and clearer, verifiable behavior compared to standard RL baselines.
翻译:决策任务通常在具有时空动态特性的图结构上展开。黑盒强化学习往往忽视局部变化如何通过网络结构传播,从而限制了样本效率和可解释性。我们提出GTL-CIRL——一种同步学习策略并挖掘因果图时序逻辑(Causal GTL)规范的闭环框架。该方法通过鲁棒性塑造奖励函数,在效应失效时收集反例,并利用高斯过程(GP)驱动的贝叶斯优化来精化参数化的因果模板。GP模型能够捕捉系统动态中的空间与时间相关性,从而实现对复杂参数空间的高效探索。在基因网络和电力网络中的案例研究表明,相较于标准RL基线方法,本框架能实现更快的学习速度,并产生更清晰、可验证的行为模式。