As machine learning models become increasingly larger, trained weakly supervised on large, possibly uncurated data sets, it becomes increasingly important to establish mechanisms for inspecting, interacting, and revising models to mitigate learning shortcuts and guarantee their learned knowledge is aligned with human knowledge. The recently proposed XIL framework was developed for this purpose, and several such methods have been introduced, each with individual motivations and methodological details. In this work, we provide a unification of various XIL methods into a single typology by establishing a common set of basic modules. In doing so, we pave the way for a principled comparison of existing, but, importantly, also future XIL approaches. In addition, we discuss existing and introduce novel measures and benchmarks for evaluating the overall abilities of a XIL method. Given this extensive toolbox, including our typology, measures, and benchmarks, we finally compare several recent XIL methods methodologically and quantitatively. In our evaluations, all methods prove to revise a model successfully. However, we found remarkable differences in individual benchmark tasks, revealing valuable application-relevant aspects for integrating these benchmarks in developing future methods.
翻译:随着机器学习模型变得日益庞大,并在大型、可能未经整理的数据集上进行弱监督训练,建立检查、交互和修正模型的机制以缓解学习捷径、确保所学知识与人类知识对齐变得愈发重要。近期提出的XIL框架正是为此目的而开发,目前已引入多种此类方法,各自具有不同的动机和方法细节。在本工作中,我们通过建立一组通用基本模块,将多种XIL方法统一为一个单一类型学。借此,我们为现有方法、更重要的是为未来XIL方法的有原则比较铺平了道路。此外,我们讨论了现有评估XIL方法整体能力的度量标准与基准,并引入了新的度量标准与基准。借助这一包含类型学、度量标准和基准的广泛工具箱,我们最终从方法论和定量角度比较了多种近期XIL方法。在评估中,所有方法均成功修正了模型。然而,我们发现各基准任务存在显著差异,揭示了将这些基准整合到未来方法开发中具有应用价值的相关方面。