Understanding the similarity of the numerous released large language models (LLMs) has many uses, e.g., simplifying model selection, detecting illegal model reuse, and advancing our understanding of what makes LLMs perform well. In this work, we measure the similarity of representations of a set of LLMs with 7B parameters. Our results suggest that some LLMs are substantially different from others. We identify challenges of using representational similarity measures that suggest the need of careful study of similarity scores to avoid false conclusions.
翻译:理解众多已发布大型语言模型(LLM)之间的相似性具有多方面用途,例如简化模型选择、检测非法模型复用,以及增进对LLM性能优势成因的理解。本研究对一组70亿参数规模的LLM的表征相似性进行度量。结果表明,部分LLM与其他模型存在显著差异。我们揭示了表征相似性度量方法存在的挑战,表明需要审慎分析相似性得分以避免得出错误结论。