Large Language Model Evaluation via Matrix Nuclear-Norm

As large language models (LLMs) continue to evolve, efficient evaluation metrics are vital for assessing their ability to compress information and reduce redundancy. While traditional metrics like Matrix Entropy offer valuable insights, they are computationally intensive for large-scale models due to their \( O(n^3) \) time complexity with Singular Value Decomposition (SVD). To mitigate this issue, we introduce the Matrix Nuclear-Norm, which not only serves as a metric to quantify the data compression proficiency of LLM but also provides a convex approximation of matrix rank to capture both predictive discriminability and diversity. By employing the \( L_{1,2}\text{-norm} \) to further approximate the nuclear norm, we can effectively assess the model's information compression capabilities. This approach reduces the time complexity to \( O(n^2) \) and eliminates the need for SVD computation. Consequently, the Matrix Nuclear-Norm achieves speeds 8 to 24 times faster than Matrix Entropy for the CEREBRAS-GPT model as sizes increase from 111M to 6.7B. This performance gap becomes more pronounced with larger models, as validated in tests with other models like Pythia. Additionally, evaluations on benchmarks and model responses confirm that our proposed Matrix Nuclear-Norm is a reliable, scalable, and efficient tool for assessing LLMs' performance, striking a balance between accuracy and computational efficiency. The code is available at https://github.com/MLGroupJLU/MatrixNuclearNorm.

翻译：随着大语言模型（LLM）的持续演进，高效的评估指标对于衡量其信息压缩与冗余削减能力至关重要。尽管传统指标如矩阵熵能提供有价值的洞察，但由于其基于奇异值分解（SVD）的 \( O(n^3) \) 时间复杂度，在大规模模型上计算成本高昂。为缓解此问题，我们引入了矩阵核范数，该指标不仅能作为量化LLM数据压缩能力的度量，还可作为矩阵秩的凸近似，以同时捕捉预测的判别性与多样性。通过采用 \( L_{1,2}\text{-范数} \) 进一步近似核范数，我们能够有效评估模型的信息压缩能力。该方法将时间复杂度降至 \( O(n^2) \)，并避免了SVD计算。因此，在CEREBRAS-GPT模型规模从111M增至6.7B的过程中，矩阵核范数的计算速度比矩阵熵快8至24倍。随着模型规模增大，这一性能差距更为显著，这在Pythia等其他模型的测试中得到了验证。此外，在基准测试和模型响应上的评估证实，我们提出的矩阵核范数是一种可靠、可扩展且高效的LLM性能评估工具，在准确性与计算效率之间取得了良好平衡。相关代码已发布于 https://github.com/MLGroupJLU/MatrixNuclearNorm。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日