The limits of interpretability in multiple linear regression

Interpreting machine-learning models has attracted increasing attention, particularly in the physical sciences, where one often seeks to understand the underlying mechanisms rather than merely make predictions. Multiple linear regression is often regarded as an interpretable alternative to more complex models, such as deep neural networks, because its predictions are expressed as explicit weighted sums of input features. However, when input features are strongly correlated, namely in the presence of multicollinearity, the learned weights can exhibit large dataset-to-dataset fluctuations and oscillatory behavior across physically similar features, making their interpretation difficult or even impossible. Although the instability of the weights under multicollinearity is well known in statistics, its consequences for physical interpretation, in particular its connection to oscillatory weights across physically similar features, have not been systematically clarified. Here, we theoretically discuss the mechanism behind this loss of interpretability by analyzing the eigenmodes of the feature correlation matrix. We show that small-eigenvalue modes associated with multicollinearity amplify fluctuations in the weights and generate oscillatory patterns that do not necessarily reflect meaningful contributions. We test this theoretical picture numerically on physics datasets and show that Ridge regularization suppresses these unstable modes, although the resulting weights must still be interpreted with caution. We further confirm the generality of our findings beyond physics by analyzing a diverse collection of publicly available datasets. Our results clarify why, in the presence of multicollinearity, physical interpretation can remain difficult even for linear regression models.

翻译：解释机器学习模型已引起越来越多的关注，尤其是在物理科学领域，人们常试图理解潜在机制而不仅仅是进行预测。多元线性回归通常被视为深度神经网络等复杂模型的可解释替代方案，因为其预测结果表示为输入特征的显式加权和。然而，当输入特征高度相关（即存在多重共线性）时，学习到的权重可能表现出显著的数据集间波动以及物理相似特征间的振荡行为，导致其解释困难甚至不可能。尽管多重共线性下权重的不稳定性在统计学中已广为人知，但其对物理解释的影响——特别是与物理相似特征间振荡权重的关联——尚未得到系统阐明。本文通过分析特征相关矩阵的本征模式，从理论上探讨了这种可解释性丧失背后的机制。研究表明，与多重共线性相关的小本征值模式会放大权重波动，并产生未必反映有意义贡献的振荡模式。我们利用物理学数据集对这一理论图景进行了数值验证，发现岭回归可抑制这些不稳定模式，但所得权重仍需谨慎解读。通过分析多样化的公开数据集，我们进一步验证了该结论在物理学之外的普适性。我们的研究结果阐明了为何在多重共线性存在时，即使是线性回归模型，物理解释仍然可能困难重重。