Modern AI systems increasingly operate inside markets and institutions where data, behavior, and incentives are endogenous. This paper develops an economic foundation for multi-agent learning by studying a principal-agent interaction in a Markov decision process with strategic externalities, where both the principal and the agent learn over time. We propose a two-phase incentive mechanism that first estimates implementable transfers and then uses them to steer long-run dynamics; under mild regret-based rationality and exploration conditions, the mechanism achieves sublinear social-welfare regret and thus asymptotically optimal welfare. Simulations illustrate how even coarse incentives can correct inefficient learning under stateful externalities, highlighting the necessity of incentive-aware design for safe and welfare-aligned AI in markets and insurance.
翻译:现代人工智能系统日益在市场和制度中运行,其中数据、行为和激励都是内生的。本文通过研究具有策略外部性的马尔可夫决策过程中的委托-代理交互,为多智能体学习建立了经济基础,其中委托方和代理方均随时间进行学习。我们提出了一种两阶段激励机制:首先估计可实施的转移支付,然后利用这些支付引导长期动态;在基于遗憾的温和理性与探索条件下,该机制实现了次线性的社会福利遗憾,从而获得渐近最优的福利。仿真实验表明,即使粗略的激励也能在状态相关外部性下纠正低效学习,这凸显了在市场和保险领域中,为构建安全且福利一致的人工智能,进行激励感知设计的必要性。