Ensuring responsible use of artificial intelligence (AI) has become imperative as autonomous systems increasingly influence critical societal domains. However, the concept of trustworthy AI remains broad and multi-faceted. This thesis advances knowledge in the safety, fairness, transparency, and accountability of AI systems. In safety, we extend classical deterministic shielding techniques to become resilient against delayed observations, enabling practical deployment in real-world conditions. We also implement both deterministic and probabilistic safety shields into simulated autonomous vehicles to prevent collisions with road users, validating the use of these techniques in realistic driving simulators. We introduce fairness shields, a novel post-processing approach to enforce group fairness in sequential decision-making settings over finite and periodic time horizons. By optimizing intervention costs while strictly ensuring fairness constraints, this method efficiently balances fairness with minimal interference. For transparency and accountability, we propose a formal framework for assessing intentional behaviour in probabilistic decision-making agents, introducing quantitative metrics of agency and intention quotient. We use these metrics to propose a retrospective analysis of intention, useful for determining responsibility when autonomous systems cause unintended harm. Finally, we unify these contributions through the ``reactive decision-making'' framework, providing a general formalization that consolidates previous approaches. Collectively, the advancements presented contribute practically to the realization of safer, fairer, and more accountable AI systems, laying the foundations for future research in trustworthy AI.
翻译:随着自主系统日益影响关键社会领域,确保人工智能(AI)的负责任使用已变得至关重要。然而,可信AI的概念仍具有广泛性和多面性。本论文在AI系统的安全性、公平性、透明度和问责制方面推进了知识前沿。在安全性方面,我们将经典的确定性屏蔽技术扩展至能够应对延迟观测,使其能够在现实条件下实现实际部署。我们还将确定性和概率性安全屏蔽机制应用于模拟自动驾驶车辆,以防止与道路使用者发生碰撞,并在真实驾驶模拟器中验证了这些技术的有效性。我们提出了公平性屏蔽技术——一种新颖的后处理方法,用于在有限周期时间范围内的序列决策场景中强制执行群体公平性。该方法通过优化干预成本并严格保证公平性约束,实现了公平性与最小干扰的高效平衡。在透明度与问责制方面,我们提出了一个用于评估概率决策智能体意向行为的形式化框架,引入了智能体性与意向商数的量化指标。利用这些指标,我们提出了意向追溯分析方法,可用于在自主系统造成意外损害时确定责任归属。最后,我们通过“反应式决策”框架统一了这些贡献,提供了一个整合现有方法的通用形式化体系。总体而言,本文提出的进展为构建更安全、更公平、更具问责性的AI系统作出了实质性贡献,为未来可信AI研究奠定了理论基础。