This paper introduces PhyloLM, a method adapting phylogenetic algorithms to Large Language Models (LLMs) to explore whether and how they relate to each other and to predict their performance characteristics. Our method calculates a phylogenetic distance metrics based on the similarity of LLMs' output. The resulting metric is then used to construct dendrograms, which satisfactorily capture known relationships across a set of 111 open-source and 45 closed models. Furthermore, our phylogenetic distance predicts performance in standard benchmarks, thus demonstrating its functional validity and paving the way for a time and cost-effective estimation of LLM capabilities. To sum up, by translating population genetic concepts to machine learning, we propose and validate a tool to evaluate LLM development, relationships and capabilities, even in the absence of transparent training information.
翻译:本文提出PhyloLM方法,该方法将系统发育算法应用于大型语言模型(LLMs),以探究它们之间是否存在关联及其关联方式,并预测其性能特征。我们的方法基于LLM输出的相似性计算系统发育距离度量。所得度量随后用于构建树状图,该图能较好地捕捉111个开源模型和45个闭源模型之间的已知关联。此外,我们的系统发育距离能够预测标准基准测试中的表现,从而证明其功能有效性,并为实现高效省时的LLM能力评估开辟道路。总之,通过将群体遗传学概念迁移至机器学习领域,我们提出并验证了一种评估LLM发展进程、关联关系及能力水平的工具,即使在缺乏透明训练信息的情况下仍可适用。