This paper explores the relationship between the condition number of a neural network's weight tensor and the extent of information encoded by the associated processing unit, viewed through the lens of information theory. It argues that a high condition number, though not sufficient for effective knowledge encoding, may indicate that the unit has learned to selectively amplify and compress information. This intuition is formalized for linear units with Gaussian inputs, linking the condition number and the transformation's log-volume scaling factor to the characteristics of the output entropy and the geometric properties of the learned transformation. The analysis demonstrates that for a fixed weight norm, a concentrated distribution of singular values (high condition number) corresponds to reduced overall information transfer, indicating a specialized and efficient encoding strategy. Furthermore, the linear stage entropy bound provides an upper limit on post-activation information for contractive, element-wise nonlinearities, supporting the condition number as a scale-invariant proxy for encoding capacity in practical neural networks. An empirical case study applies these principles to guide selective fine-tuning of Large Language Models for both a new task and a new input modality. The experiments show that the proposed method, named KappaTune, effectively mitigates catastrophic forgetting. Unlike many existing catastrophic forgetting mitigation methods that rely on access to pre-training statistics, which are often unavailable, this selective fine-tuning approach offers a way to bypass this common requirement.
翻译:本文从信息论视角探讨了神经网络权重张量的条件数与相应处理单元所编码信息程度之间的关系。文章指出,高条件数虽不足以保证有效的知识编码,但可能表明该单元已学会选择性放大与压缩信息。针对高斯输入的线性单元,这一直觉被形式化表达:条件数与变换的对数体积缩放因子共同关联于输出熵的特性及所学变换的几何性质。分析表明,在固定权重范数下,奇异值的集中分布(高条件数)对应着整体信息传递的减少,这指示了一种专业化且高效的编码策略。此外,线性阶段熵界为收缩型逐元素非线性激活后的信息提供了上限,从而支持将条件数作为实际神经网络编码能力的尺度不变代理指标。一项实证案例研究应用这些原理指导大型语言模型针对新任务与新输入模态的选择性微调。实验表明,所提出的方法(命名为KappaTune)能有效缓解灾难性遗忘。与许多依赖预训练统计信息(通常难以获取)的现有灾难性遗忘缓解方法不同,这种选择性微调方法提供了一条绕过此常见要求的途径。