Deep learning has achieved remarkable success across many domains, but it has also created a growing demand for interpretability in model predictions. Although many explainable machine learning methods have been proposed, post-hoc explanations lack guaranteed fidelity and are sensitive to hyperparameter choices, highlighting the appeal of inherently interpretable models. For example, linear regression provides clear feature effects through its coefficients. However, such models are often outperformed by more complex neural networks (NNs) that usually lack inherent interpretability. To address this dilemma, we introduce NIMO, a framework that combines inherent interpretability with the expressive power of neural networks. Building on the simple linear regression, NIMO is able to provide flexible and intelligible feature effects. Relevantly, we develop an optimization method based on parameter elimination, that allows for optimizing the NN parameters and linear coefficients effectively and efficiently. By relying on adaptive ridge regression we can easily incorporate sparsity as well. We show empirically that our model can provide faithful and intelligible feature effects while maintaining good predictive performance.
翻译:深度学习已在众多领域取得显著成功,但也引发了对模型预测可解释性日益增长的需求。尽管已提出许多可解释机器学习方法,但事后解释方法缺乏保证的保真度,且对超参数选择敏感,这凸显了内在可解释模型的吸引力。例如,线性回归通过其系数提供清晰的特征效应。然而,此类模型的表现通常逊于更复杂的神经网络,而后者通常缺乏内在可解释性。为解决这一困境,我们提出了NIMO框架,该框架将内在可解释性与神经网络的表达能力相结合。基于简单的线性回归,NIMO能够提供灵活且易于理解的特征效应。相关地,我们开发了一种基于参数消除的优化方法,能够高效地优化神经网络参数与线性系数。通过依赖自适应岭回归,我们还能轻松引入稀疏性。实验表明,我们的模型在保持良好预测性能的同时,能够提供忠实且易于理解的特征效应。