Machine learning models are being increasingly deployed to take, or assist in taking, complicated and high-impact decisions, from quasi-autonomous vehicles to clinical decision support systems. This poses challenges, particularly when models have hard-to-detect failure modes and are able to take actions without oversight. In order to handle this challenge, we propose a method for a collaborative system that remains safe by having a human ultimately making decisions, while giving the model the best opportunity to convince and debate them with interpretable explanations. However, the most helpful explanation varies among individuals and may be inconsistent across stated preferences. To this end we develop an algorithm, Ardent, to efficiently learn a ranking through interaction and best assist humans complete a task. By utilising a collaborative approach, we can ensure safety and improve performance while addressing transparency and accountability concerns. Ardent enables efficient and effective decision-making by adapting to individual preferences for explanations, which we validate through extensive simulations alongside a user study involving a challenging image classification task, demonstrating consistent improvement over competing systems.
翻译:机器学习模型正被越来越多地部署用于执行或协助执行复杂且高影响力的决策,从准自动驾驶汽车到临床决策支持系统。这带来了挑战,尤其是当模型存在难以检测的故障模式且能够在缺乏监督的情况下采取行动时。为了应对这一挑战,我们提出了一种协作系统方法,通过让人类最终做出决策来确保安全性,同时赋予模型最佳机会,利用可解释的解释来说服人类并与之辩论。然而,最有帮助的解释因人而异,并且可能与表露的偏好不一致。为此,我们开发了一种名为Ardent的算法,通过交互高效地学习排序,以最佳方式辅助人类完成任务。通过采用协作方法,我们能够在解决透明度和问责问题的基础上,确保安全性并提升性能。Ardent通过适应个体对解释的偏好,实现了高效且有效的决策——我们通过大量模拟实验以及一项涉及具有挑战性的图像分类任务的用户研究验证了这一点,结果表明其性能持续优于竞争系统。