Tools from random matrix theory have become central to deep learning theory, using spectral information to provide mechanisms for modeling generalization, robustness, scaling, and failure modes. While often capable of modeling empirical behavior, practical computations are limited by matrix size, often imposing a restriction to models that are too small to be realistic. This motivates the inference of properties of larger models from the behavior of smaller ones. Free decompression (FD) is a recently proposed method for extrapolating spectral information across matrix sizes, but its utility is currently limited by strong assumptions that preclude its implementation on more realistic machine learning (ML) models. We use algebraic spectral curve theory to provide a general FD methodology for spectral densities whose Stieltjes transform satisfies an algebraic relation, a modeling assumption that is more likely to hold in practice. This recasts FD as an evolution along spectral curves which can be readily integrated. Our framework enables the expansion of spectral densities that have multiple or multi-modal bulks, that exist at multiple scales, and that contain atoms, all characteristic of real-world data and popular ML models. We demonstrate the efficacy of our framework on models of interest in modern ML, including Hessian and activation matrices associated with neural networks and large-scale diffusion models.
翻译:随机矩阵理论中的工具已成为深度学习理论的核心,利用谱信息为建模泛化性、鲁棒性、扩展性和失效模式提供机制。尽管这些工具通常能够建模经验行为,但其实际计算受限于矩阵规模,常导致只能应用于规模过小而不切实际的模型。这推动了从较小模型的行为推断较大模型属性的研究。自由解压(FD)是近期提出的一种跨矩阵规模外推谱信息的方法,但其效用目前受到强假设的限制,使得该方法无法应用于更真实的机器学习(ML)模型。我们利用代数谱曲线理论,提出了一种通用的FD方法论,适用于其Stieltjes变换满足代数关系的谱密度——这一建模假设在实践中更可能成立。该方法将FD重新表述为沿谱曲线的演化,使其能够方便地进行积分。我们的框架能够扩展具有多峰或多模态主体、跨多重尺度分布、且包含原子的谱密度,这些特征正是真实世界数据与主流ML模型的特点。我们在现代ML领域的关键模型上演示了该框架的有效性,包括与神经网络及大规模扩散模型相关联的Hessian矩阵与激活矩阵。