Principled accountability in the aftermath of harms is essential to the trustworthy design and governance of algorithmic decision making. Legal theory offers a paramount method for assessing culpability: putting the agent 'on the stand' to subject their actions and intentions to cross-examination. We show that under minimal assumptions automated reasoning can rigorously interrogate algorithmic behaviors as in the adversarial process of legal fact finding. We model accountability processes, such as trials or review boards, as Counterfactual-Guided Logic Exploration and Abstraction Refinement (CLEAR) loops. We use the formal methods of symbolic execution and satisfiability modulo theories (SMT) solving to discharge queries about agent behavior in factual and counterfactual scenarios, as adaptively formulated by a human investigator. In order to do so, for a decision algorithm $\mathcal{A}$ we use symbolic execution to represent its logic as a statement $\Pi$ in the decidable theory $\texttt{QF_FPBV}$. We implement our framework and demonstrate its utility on an illustrative car crash scenario.
翻译:在危害发生后建立原则性问责机制,对于算法决策的可靠设计与治理至关重要。法律理论提供了一种评估罪责的首要方法:将行为主体"置于证人席"以对其行为与意图进行交叉质询。我们证明,在最小化假设条件下,自动化推理能够像法律事实认定的对抗性程序一样严格审视算法行为。我们将审判或审查委员会等问责流程建模为反事实引导的逻辑探索与抽象精化(CLEAR)循环。通过符号执行和可满足性模理论(SMT)求解等形式化方法,我们能够根据人类调查员自适应构建的假设情景,针对算法在事实与反事实场景中的行为进行查询求解。为此,对于决策算法$\mathcal{A}$,我们使用符号执行将其逻辑表示为可判定理论$\texttt{QF_FPBV}$中的语句$\Pi$。我们实现了该框架,并通过一个示意的车辆碰撞场景验证其效用。