As large language models (LLMs) continue to evolve, efficient evaluation metrics are vital for assessing their ability to compress information and reduce redundancy. While traditional metrics like Matrix Entropy offer valuable insights, they are computationally intensive for large-scale models due to their \( O(n^3) \) time complexity with Singular Value Decomposition (SVD). To mitigate this issue, we introduce the Matrix Nuclear-Norm, which not only serves as a metric to quantify the data compression proficiency of LLM but also provides a convex approximation of matrix rank to capture both predictive discriminability and diversity. By employing the \( L_{1,2}\text{-norm} \) to further approximate the nuclear norm, we can effectively assess the model's information compression capabilities. This approach reduces the time complexity to \( O(n^2) \) and eliminates the need for SVD computation. Consequently, the Matrix Nuclear-Norm achieves speeds 8 to 24 times faster than Matrix Entropy for the CEREBRAS-GPT model as sizes increase from 111M to 6.7B. This performance gap becomes more pronounced with larger models, as validated in tests with other models like Pythia. Additionally, evaluations on benchmarks and model responses confirm that our proposed Matrix Nuclear-Norm is a reliable, scalable, and efficient tool for assessing LLMs' performance, striking a balance between accuracy and computational efficiency. The code is available at https://github.com/MLGroupJLU/MatrixNuclearNorm.
翻译:随着大语言模型(LLM)的持续演进,高效的评估指标对于衡量其信息压缩与冗余削减能力至关重要。尽管传统指标如矩阵熵能提供有价值的洞察,但由于其基于奇异值分解(SVD)的 \( O(n^3) \) 时间复杂度,在大规模模型上计算成本高昂。为缓解此问题,我们引入了矩阵核范数,该指标不仅能作为量化LLM数据压缩能力的度量,还可作为矩阵秩的凸近似,以同时捕捉预测的判别性与多样性。通过采用 \( L_{1,2}\text{-范数} \) 进一步近似核范数,我们能够有效评估模型的信息压缩能力。该方法将时间复杂度降至 \( O(n^2) \),并避免了SVD计算。因此,在CEREBRAS-GPT模型规模从111M增至6.7B的过程中,矩阵核范数的计算速度比矩阵熵快8至24倍。随着模型规模增大,这一性能差距更为显著,这在Pythia等其他模型的测试中得到了验证。此外,在基准测试和模型响应上的评估证实,我们提出的矩阵核范数是一种可靠、可扩展且高效的LLM性能评估工具,在准确性与计算效率之间取得了良好平衡。相关代码已发布于 https://github.com/MLGroupJLU/MatrixNuclearNorm。