Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the similarities among different LLMs, experiments show that current classifiers struggle to accurately determine which specific model generated a given text in multi-class classification tasks. Meanwhile, variations across LLMs also result in evolving patterns of word usage in academic papers. By adopting a direct and highly interpretable linear approach and accounting for differences between models and prompts, we quantitatively assess these effects and show that real-world LLM usage is heterogeneous and dynamic.
翻译:通过对arXiv论文的分析,我们报告了若干受大型语言模型(LLMs)驱动但此前未获充分关注的词汇使用变化趋势,例如标题中"beyond"和"via"出现频率显著上升,以及摘要中"the"和"of"使用频率下降。由于不同LLM之间的相似性,实验表明现有分类器在多分类任务中难以准确判定文本的具体生成模型。与此同时,不同LLM之间的差异也导致学术论文中词汇使用模式呈现动态演化特征。通过采用直接且高可解释性的线性方法,并纳入模型差异与提示词差异的考量,我们对这些效应进行了定量评估,证明真实场景中的LLM使用具有异质性和动态性。