Margin Play: A Multi-Agent System For Public Policy Analysis In The Brazilian Equatorial Margin

Antonio de Sousa Leitão Filho,Fabrício Saul Lima,Selby Mykael Lima dos Santos,Rejani Bandeira Vieira Sousa,Luís Jorge Mesquita de Jesus,Dennys Correia da Silva,Allan Kardec Duailibe Barros Filho

The Brazilian Equatorial Margin (BEM) is Brazil's next offshore oil frontier, with operations expected to begin in 2026 in the Foz do Amazonas basin. Its assets are fiscally and territorially linked primarily to Maranhao -- the state with the lowest HDI in the Federation (0.676, IBGE 2022). This raises the central policy question: under what conditions does BEM exploration generate net positive externalities for Maranhao? The problem is intrinsically multi-agent: the Federal Government seeks revenue and energy security; the state seeks regional welfare under constitutional royalty earmarking; the operator maximizes profit under risk; ANP and IBAMA hold conflicting mandates; and Amazonian communities prioritize territorial and environmental vectors over monetary income. We present Margin Play, a Multi-Agent Reinforcement Learning (MARL) system simulating these tensions under Brazilian empirical calibration and classical economic literature. It implements six agents under the CTDE paradigm, trained with BRO-MARL. Results from 60,000 episodes across six scenarios indicate the answer is conditional on the institutional regime: under the reference baseline, the welfare gain is marginal (Waval approx. 1.68), whereas the MA-Prospero configuration yields Delta W = +17.5% and Delta Rcom = +21.3%, with a lower environmental liability (Eamb = 0.048 vs. 0.076). The fundamental problem is not a trade-off between production and welfare, but the choice of public policy regime linked to exploration.

翻译：巴西亚马逊边缘（BEM）是巴西下一个海上石油前沿区域，预计于2026年在亚马逊河口盆地启动作业。其资产主要与联邦中人类发展指数最低的马拉尼昂州（HDI为0.676，IBGE 2022）存在财政和领土关联。这引发核心政策问题：在何种条件下，BEM勘探能为马拉尼昂州产生净正外部性？该问题本质上是多智能体博弈：联邦政府追求财政收入与能源安全；州政府在宪法规定的特许权使用费分配下追求区域福利；运营商在风险下最大化利润；ANP与IBAMA存在冲突性授权；亚马逊社区更重视领土与环境因素而非货币收入。我们提出Margin Play，一个基于巴西实证校准与经典经济学文献的多智能体强化学习（MARL）系统，模拟这些张力。该系统在CTDE范式下实现六个智能体，使用BRO-MARL进行训练。六个场景共60,000个回合的结果表明：答案取决于制度安排——在参考基线情景下，福利增益微弱（Waval约1.68），而MA-Prospero配置使Delta W提升+17.5%，Delta Rcom提升+21.3%，同时环境负债更低（Eamb=0.048 vs. 0.076）。根本问题并非生产与福利间的权衡，而是与勘探相关的公共政策制度选择。