Motivated by the interpretability question in ML models as a crucial element for the successful deployment of AI systems, this paper focuses on rule extraction as a means for neural networks interpretability. Through a systematic literature review, different approaches for extracting rules from feedforward neural networks, an important block in deep learning models, are identified and explored. The findings reveal a range of methods developed for over two decades, mostly suitable for shallow neural networks, with recent developments to meet deep learning models' challenges. Rules offer a transparent and intuitive means of explaining neural networks, making this study a comprehensive introduction for researchers interested in the field. While the study specifically addresses feedforward networks with supervised learning and crisp rules, future work can extend to other network types, machine learning methods, and fuzzy rule extraction.
翻译:受机器学习模型可解释性这一人工智能系统成功部署关键要素的驱动,本文聚焦于规则提取作为神经网络可解释性的一种实现手段。通过系统文献综述,本文识别并探究了从前馈神经网络(深度学习模型的重要组成部分)中提取规则的不同方法。研究发现表明,过去二十余年间已发展出多种方法,其中多数适用于浅层神经网络,而近期进展则旨在应对深度学习模型的挑战。规则为解释神经网络提供了透明且直观的方式,使本研究成为该领域研究者的综合性入门指南。尽管本研究专门针对监督学习下的前馈网络与清晰规则展开,未来工作可扩展至其他网络类型、机器学习方法及模糊规则提取领域。