As machine learning models become increasingly larger, trained weakly supervised on large, possibly uncurated data sets, it becomes increasingly important to establish mechanisms for inspecting, interacting, and revising models to mitigate learning shortcuts and guarantee their learned knowledge is aligned with human knowledge. The recently proposed XIL framework was developed for this purpose, and several such methods have been introduced, each with individual motivations and methodological details. In this work, we provide a unification of various XIL methods into a single typology by establishing a common set of basic modules. In doing so, we pave the way for a principled comparison of existing, but, importantly, also future XIL approaches. In addition, we discuss existing and introduce novel measures and benchmarks for evaluating the overall abilities of a XIL method. Given this extensive toolbox, including our typology, measures, and benchmarks, we finally compare several recent XIL methods methodologically and quantitatively. In our evaluations, all methods prove to revise a model successfully. However, we found remarkable differences in individual benchmark tasks, revealing valuable application-relevant aspects for integrating these benchmarks in developing future methods.
翻译:随着机器学习模型规模日益增大,并在大规模(可能未经整理)数据集上进行弱监督训练,建立用于检查、交互及修正模型的机制以减轻学习捷径、确保所学知识与人类认知对齐变得愈发重要。近期提出的XIL框架正为此目标而设计,并已涌现出多种方法,各具独特的动机与方法论细节。本文通过建立一组通用基础模块,将多种XIL方法统一为单一类型学框架,从而为现有方法(更为关键的是,也为未来XIL方法)的原则性比较铺平道路。此外,我们讨论了现有评估指标并提出了全新的基准测试,用于综合评价XIL方法的整体能力。基于这一包含类型学、评估指标与基准测试的广泛工具箱,我们最终从方法论和定量两方面对比了多种近期XIL方法。实验表明,所有方法均能成功修正模型,但各方法在单项基准任务上存在显著差异,这揭示了将这些基准测试融入未来方法开发时需考量的重要应用相关因素。