The introduction of Artificial Intelligence (AI) generative language models such as GPT (Generative Pre-trained Transformer) and tools such as ChatGPT has triggered a revolution that can transform how text is generated. This has many implications, for example, as AI-generated text becomes a significant fraction of the text, would this have an effect on the language capabilities of readers and also on the training of newer AI tools? Would it affect the evolution of languages? Focusing on one specific aspect of the language: words; will the use of tools such as ChatGPT increase or reduce the vocabulary used or the lexical richness? This has implications for words, as those not included in AI-generated content will tend to be less and less popular and may eventually be lost. In this work, we perform an initial comparison of the vocabulary and lexical richness of ChatGPT and humans when performing the same tasks. In more detail, two datasets containing the answers to different types of questions answered by ChatGPT and humans, and a third dataset in which ChatGPT paraphrases sentences and questions are used. The analysis shows that ChatGPT tends to use fewer distinct words and lower lexical richness than humans. These results are very preliminary and additional datasets and ChatGPT configurations have to be evaluated to extract more general conclusions. Therefore, further research is needed to understand how the use of ChatGPT and more broadly generative AI tools will affect the vocabulary and lexical richness in different types of text and languages.
翻译:人工智能生成语言模型(如GPT生成式预训练Transformer)及其工具(如ChatGPT)的引入,掀起了一场可能改变文本生成方式的革命。这带来诸多影响:例如,当AI生成文本占据文本总量的显著比例时,是否会影响读者乃至新一代AI工具训练的语言能力?是否会影响语言的演化?聚焦语言的特定维度——词汇:使用ChatGPT等工具会增加还是减少所用词汇量或词汇丰富度?这关乎词汇存续——那些未纳入AI生成内容的词汇将逐渐式微,最终可能消亡。本研究针对人类与ChatGPT在执行相同任务时的词汇量与词汇丰富度进行初步比较。具体而言,我们使用了两个包含ChatGPT与人类回答不同类型问题答案的数据集,以及第三个由ChatGPT对句子和问题进行释义的数据集。分析表明,ChatGPT使用的不同词汇数量及词汇丰富度均低于人类。这些结果仅为初步结论,需评估更多数据集及ChatGPT配置方能得出更普遍性结论。因此,需要进一步研究以理解ChatGPT及更广泛的生成式AI工具将如何影响不同文本类型及语言中的词汇量与词汇丰富度。