Social Explainable AI (SAI) is a new direction in artificial intelligence that emphasises decentralisation, transparency, social context, and focus on the human users. SAI research is still at an early stage. Consequently, it concentrates on delivering the intended functionalities, but largely ignores the possibility of unwelcome behaviours due to malicious or erroneous activity. We propose that, in order to capture the breadth of relevant aspects, one can use models and logics of strategic ability, that have been developed in multi-agent systems. Using the STV model checker, we take the first step towards the formal modelling and verification of SAI environments, in particular of their resistance to various types of attacks by compromised AI modules.
翻译:社会可解释人工智能(SAI)是人工智能领域的新方向,强调去中心化、透明性、社会情境以及以人类用户为核心。SAI研究仍处于早期阶段,当前主要聚焦于实现预期功能,但很大程度上忽略了因恶意或错误行为导致的不良行为可能性。我们提出,为全面捕捉相关方面的广度,可采用多智能体系统中发展成熟的策略能力模型与逻辑体系。借助STV模型检验器,我们迈出了对SAI环境进行形式化建模与验证的第一步,特别是针对受损AI模块发起的各类攻击的抵抗能力。