Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives

from arxiv, The bulletin of Graduate School of Science and Engineering, Hosei University, Vol.64 (03/2023). This article draws heavily from arxiv:2009.12064, arxiv:2104.08763, arxiv:1905.07289, and arxiv:2204.11588

With the dramatic advances in deep learning technology, machine learning research is focusing on improving the interpretability of model predictions as well as prediction performance in both basic and applied research. While deep learning models have much higher prediction performance than traditional machine learning models, the specific prediction process is still difficult to interpret and/or explain. This is known as the black-boxing of machine learning models and is recognized as a particularly important problem in a wide range of research fields, including manufacturing, commerce, robotics, and other industries where the use of such technology has become commonplace, as well as the medical field, where mistakes are not tolerated. This bulletin is based on the summary of the author's dissertation. The research summarized in the dissertation focuses on the attention mechanism, which has been the focus of much attention in recent years, and discusses its potential for both basic research in terms of improving prediction performance and interpretability, and applied research in terms of evaluating it for real-world applications using large data sets beyond the laboratory environment. The dissertation also concludes with a summary of the implications of these findings for subsequent research and future prospects in the field.

翻译：随着深度学习技术的迅猛发展，机器学习研究在基础与应用两个层面均聚焦于提升模型预测的可解释性与预测性能。尽管深度学习模型的预测性能远超传统机器学习模型，但其具体预测过程仍难以解读与阐释。这一现象被称为机器学习模型的"黑箱化"问题，在制造、商业、机器人等已普遍应用该技术的行业，以及不容许出现失误的医疗领域等广泛研究范畴中，被视为尤为重要的课题。本简报基于作者博士论文的总结。该论文所综述的研究聚焦于近年来备受瞩目的注意力机制，从两方面探讨其潜力：一方面从基础研究角度探讨其在提升预测性能与可解释性方面的作用，另一方面从应用研究角度评估其在超越实验室环境的大规模数据集实际场景中的表现。论文最终还总结了这些发现对后续研究的启示，并展望了该领域的发展前景。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日