Trust as Monitoring: Evolutionary Dynamics of User Trust and AI Developer Behaviour

Adeela Bashir,Zhao Song,Ndidi Bianca Ogbo,Nataliya Balabanova,Martin Smit,Chin-wing Leung,Paolo Bova,Manuel Chica Serrano,Dhanushka Dissanayake,Manh Hong Duong,Elias Fernandez Domingos,Nikita Huber-Kralj,Marcus Krellner,Andrew Powell,Stefan Sarkadi,Fernando P. Santos,Zia Ush Shamszaman,Chaimaa Tarzi,Paolo Turrini,Grace Ibukunoluwa Ufeoshi,Victor A. Vargas-Perez,Alessandro Di Stefano,Simon T. Powers,The Anh Han

AI safety is an increasingly urgent concern as the capabilities and adoption of AI systems grow. Existing evolutionary models of AI governance have primarily examined incentives for safe development and effective regulation, typically representing users' trust as a one-shot adoption choice rather than as a dynamic, evolving process shaped by repeated interactions. We instead model trust as reduced monitoring in a repeated, asymmetric interaction between users and AI developers, where checking AI behaviour is costly. Using evolutionary game theory, we study how user trust strategies and developer choices between safe (compliant) and unsafe (non-compliant) AI co-evolve under different levels of monitoring cost and institutional regimes. We complement the infinite-population replicator analysis with stochastic finite-population dynamics and reinforcement learning (Q-learning) simulations. Across these approaches, we find three robust long-run regimes: no adoption with unsafe development, unsafe but widely adopted systems, and safe systems that are widely adopted. Only the last is desirable, and it arises when penalties for unsafe behaviour exceed the extra cost of safety and users can still afford to monitor at least occasionally. Our results formally support governance proposals that emphasise transparency, low-cost monitoring, and meaningful sanctions, and they show that neither regulation alone nor blind user trust is sufficient to prevent evolutionary drift towards unsafe or low-adoption outcomes.

翻译：AI安全性正随着AI系统能力的提升和广泛应用而日益成为紧迫问题。现有AI治理的演化模型主要关注安全开发与有效监管的激励机制，通常将用户信任表征为一次性采纳决策，而非由重复互动塑造的动态演化过程。我们转而将信任建模为一种在用户与AI开发者之间重复非对称博弈中的监控削弱行为——在这种互动模式下，核查AI行为需要付出成本。通过演化博弈论，我们研究在不同监控成本与制度框架下，用户信任策略与开发者在安全（合规）与不安全（违规）AI之间的选择如何协同演化。我们以随机有限种群动力学和强化学习（Q-learning）仿真作为无限种群复制者分析的补充。跨方法分析发现三种稳健的长期演化稳态：无采纳伴生不安全开发、不安全但广泛采纳的系统、以及安全且广泛采纳的系统。其中仅最后一种符合期望，其出现条件为：违规惩罚超过安全开发的额外成本，且用户仍能至少承担间歇性监控成本。我们的研究结果为强调透明度、低成本监控及有效制裁的治理方案提供了形式化支持，同时表明：单纯的监管约束或盲目的用户信任均不足以阻止系统向不安全或低采纳率的演化歧途。

相关内容

关注 7106

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《任务中心化指标：提升国防行动中人工智能系统的可靠性与稳健性》最新报告

专知会员服务

20+阅读 · 2月22日

《军事应用中的AI：建立信任》最新报告

专知会员服务

24+阅读 · 2025年12月29日

【博士论文】迈向负责任的人工智能：自主系统在安全性、公平性与可问责性方面的最新进展

专知会员服务

20+阅读 · 2025年6月15日

《防务领域人工智能可信赖性：为防务开发负责任、符合伦理且可信赖的AI系统》欧洲防务局2025最新107页

专知会员服务

22+阅读 · 2025年5月14日