Through an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the similarities among different LLMs, experiments show that current classifiers struggle to accurately determine which specific model generated a given text in multi-class classification tasks. Meanwhile, variations across LLMs also result in evolving patterns of word usage in academic papers. By adopting a direct and highly interpretable linear approach and accounting for differences between models and prompts, we quantitatively assess these effects and show that real-world LLM usage is heterogeneous and dynamic.
翻译:通过对arXiv论文的分析,我们报告了若干由大语言模型(LLM)驱动但此前未得到充分关注的用词变化,例如标题中"beyond"和"via"的出现频率上升,以及摘要中"the"和"of"的出现频率下降。由于不同LLM之间存在相似性,实验表明当前分类器在多类分类任务中难以准确判定文本的具体生成模型。与此同时,LLM间的差异也导致了学术论文用词模式的持续演变。通过采用直接且高度可解释的线性方法,并考虑模型与提示词之间的差异,我们对这些影响进行了定量评估,并揭示出真实场景下的LLM使用具有异质性与动态性。