As Artificial Intelligence systems become integral across domains, the demand for explainability grows. While the effort by the scientific community is focused on obtaining a better explanation for the model, it is important not to ignore the potential of this explanation process to improve training as well. While existing efforts primarily focus on generating and evaluating explanations for black-box models, there remains a critical gap in directly enhancing models through these evaluations. This paper introduces SHIELD (Selective Hidden Input Evaluation for Learning Dynamics), a regularization technique for explainable artificial intelligence designed to improve model quality by concealing portions of input data and assessing the resulting discrepancy in predictions. In contrast to conventional approaches, SHIELD regularization seamlessly integrates into the objective function, enhancing model explainability while also improving performance. Experimental validation on benchmark datasets underscores SHIELD's effectiveness in improving Artificial Intelligence model explainability and overall performance. This establishes SHIELD regularization as a promising pathway for developing transparent and reliable Artificial Intelligence regularization techniques.
翻译:随着人工智能系统在各领域深度整合,对可解释性的需求日益增长。尽管科学界的努力主要集中于为模型获取更优的解释,但也不应忽视这一解释过程对提升训练效果的潜力。现有工作主要聚焦于为黑箱模型生成和评估解释,但通过评估直接增强模型性能的关键环节仍存在明显空白。本文提出SHIELD(面向学习动态的选择性隐藏输入评估)——一种面向可解释人工智能的正则化技术,通过隐藏部分输入数据并评估由此产生的预测差异来提升模型质量。与传统方法不同,SHIELD正则化无缝融入目标函数,在增强模型可解释性的同时提升性能。在基准数据集上的实验验证表明,SHIELD在提升人工智能模型可解释性与整体性能方面具有显著效果,这使其成为开发透明可靠的人工智能正则化技术的一条有前景的路径。