As machine learning models become increasingly larger, trained weakly supervised on large, possibly uncurated data sets, it becomes increasingly important to establish mechanisms for inspecting, interacting, and revising models to mitigate learning shortcuts and guarantee their learned knowledge is aligned with human knowledge. The recently proposed XIL framework was developed for this purpose, and several such methods have been introduced, each with individual motivations and methodological details. In this work, we provide a unification of various XIL methods into a single typology by establishing a common set of basic modules. In doing so, we pave the way for a principled comparison of existing, but, importantly, also future XIL approaches. In addition, we discuss existing and introduce novel measures and benchmarks for evaluating the overall abilities of a XIL method. Given this extensive toolbox, including our typology, measures, and benchmarks, we finally compare several recent XIL methods methodologically and quantitatively. In our evaluations, all methods prove to revise a model successfully. However, we found remarkable differences in individual benchmark tasks, revealing valuable application-relevant aspects for integrating these benchmarks in developing future methods.
翻译:随着机器学习模型日益庞大,并在大规模、可能未经整理的数据集上进行弱监督训练,建立检查、交互和修改模型的机制变得愈发重要,以减轻学习捷径并确保模型所学知识与人类知识对齐。最近提出的XIL框架正是为此目的而开发,目前已引入多种方法,各自具有不同的动机和方法细节。在本工作中,我们通过建立一组共同的基本模块,将各种XIL方法统一为单一类型学。由此,我们为现有以及(更重要的是)未来XIL方法的原则性比较铺平了道路。此外,我们讨论了现有并引入了新的度量和基准,用于评估XIL方法的整体能力。借助这一包括类型学、度量和基准的广泛工具箱,我们最终在方法论和定量层面比较了几种近期的XIL方法。在评估中,所有方法均证明能成功修改模型。然而,我们在个别基准任务中发现了显著差异,揭示了将这些基准整合到未来方法开发中具有应用价值的宝贵方面。