Automated matching engines execute millions of orders per session, yet systematic asymmetries in latency, order size, and market access compound into persistent execution disparities that erode participant trust. We formulate provably fair order matching as a Constrained Markov Decision Process and propose CPO-FOAM (Constrained Policy Optimization with Feedback-Optimized Adaptive Margins). An inner loop computes an analytic trust-region step on the Fisher information manifold; a PID-controlled outer loop dynamically tightens safety margins, suppressing the sawtooth oscillations endemic to Lagrangian methods under non-stationary dynamics. Group fairness (demographic parity, equalized odds) enters the CMDP cost vector while individual Lipschitz fairness is enforced deterministically via spectral normalization. We prove BIBO stability and that the integral term drives steady-state violations to zero. On LOBSTER NASDAQ data across six market regimes, CPO-FOAM recovers 95.9% of unconstrained throughput at 2.5% constraint violation frequency; on crypto-asset LOB data under MEV injection it captures 98.4% of the reward envelope at 3.2% CVF. The method scales sub-linearly to M=8 constraints, settles on-chain within one Ethereum block, and yields a 2.1X reward improvement on Safety-Gymnasium, confirming domain-agnostic generalization.
翻译:自动匹配引擎每会话执行数百万笔订单,但延迟、订单规模和市场准入中的系统性不对称会累积为持续性执行差异,从而侵蚀参与者信任。我们将可证明公平的订单匹配形式化为约束马尔可夫决策过程,并提出CPO-FOAM(具有反馈优化自适应边界的约束策略优化)。内环在Fisher信息流形上计算解析信任区域步长;PID控制的外环动态收紧安全边界,抑制非平稳动态下拉格朗日方法固有的锯齿振荡。群体公平性(人口均等、均等化几率)纳入CMDP成本向量,而个体Lipschitz公平性通过谱归一化确定性执行。我们证明了BIBO稳定性,且积分项将稳态违规归零。在覆盖六种市场机制的LOBSTER纳斯达克数据上,CPO-FOAM在2.5%约束违反频率下恢复95.9%的无约束吞吐量;在经历MEV注入的加密资产LOB数据上,它在3.2% CVF下捕获98.4%的奖励包络。该方法扩展至M=8个约束时呈亚线性复杂度,在单个以太坊区块内完成上链结算,并在Safety-Gymnasium上实现2.1倍的奖励提升,证实了其领域无关的泛化能力。