Large-scale language models, like ChatGPT, have garnered significant media attention and stunned the public with their remarkable capacity for generating coherent text from short natural language prompts. In this paper, we aim to conduct a systematic inspection of ChatGPT's performance in two controllable generation tasks, with respect to ChatGPT's ability to adapt its output to different target audiences (expert vs. layman) and writing styles (formal vs. informal). Additionally, we evaluate the faithfulness of the generated text, and compare the model's performance with human-authored texts. Our findings indicate that the stylistic variations produced by humans are considerably larger than those demonstrated by ChatGPT, and the generated texts diverge from human samples in several characteristics, such as the distribution of word types. Moreover, we observe that ChatGPT sometimes incorporates factual errors or hallucinations when adapting the text to suit a specific style.
翻译:大型语言模型(如ChatGPT)凭借其从简短自然语言提示生成连贯文本的显著能力,已引起媒体广泛关注并令公众惊叹。本文旨在系统审视ChatGPT在两项可控生成任务中的表现,重点考察其输出适应不同目标受众(专家与普通读者)及写作风格(正式与非正式)的能力。此外,我们评估了生成文本的忠实度,并将其与人类作者文本进行性能比较。研究结果表明,人类产生的文体差异远大于ChatGPT所展现的差异,且生成文本在词类分布等若干特征上与人类样本存在偏差。值得注意的是,我们发现ChatGPT在调整文本以适应特定风格时,有时会引入事实性错误或产生幻觉。