As machine learning models become increasingly larger, trained weakly supervised on large, possibly uncurated data sets, it becomes increasingly important to establish mechanisms for inspecting, interacting, and revising models to mitigate learning shortcuts and guarantee their learned knowledge is aligned with human knowledge. The recently proposed XIL framework was developed for this purpose, and several such methods have been introduced, each with individual motivations and methodological details. In this work, we provide a unification of various XIL methods into a single typology by establishing a common set of basic modules. In doing so, we pave the way for a principled comparison of existing, but, importantly, also future XIL approaches. In addition, we discuss existing and introduce novel measures and benchmarks for evaluating the overall abilities of a XIL method. Given this extensive toolbox, including our typology, measures, and benchmarks, we finally compare several recent XIL methods methodologically and quantitatively. In our evaluations, all methods prove to revise a model successfully. However, we found remarkable differences in individual benchmark tasks, revealing valuable application-relevant aspects for integrating these benchmarks in developing future methods.
翻译:随着机器学习模型规模日益增大,并在大规模、可能未经整理的数据集上进行弱监督训练,建立检查、交互和修订模型的机制变得愈发重要,以减轻学习捷径并确保其习得的知识与人类知识一致。近期提出的XIL框架正是为此目的而开发,多种此类方法已被引入,每种方法都有各自的动机和方法论细节。在本工作中,我们通过建立一组共同的基础模块,将多种XIL方法统一为一个单一的类型学。通过这一方式,我们为现有方法,更重要的是也为未来XIL方法的系统性比较铺平了道路。此外,我们讨论了现有评估指标,并引入了新的度量标准和基准测试,以全面评估XIL方法的能力。基于这一广泛的工具箱(包括我们的类型学、度量标准和基准测试),我们最终从方法论和定量角度比较了若干近期XIL方法。在评估中,所有方法均被证明能成功修正模型。然而,我们发现各基准测试任务之间存在显著差异,揭示了将这些基准测试整合到未来方法开发中的宝贵应用相关方面。