Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validity in conveying birth, death, hiring, and firing events. Remarkably, 87.43% of the narratives sufficiently convey the intention of the structured prompt. To automate the identification of valid and invalid narratives, we train and validate nine Machine Learning models on the classified datasets. Leveraging these models, we extend our analysis to predict the classifications of the remaining 21,120 narratives. All the ML models excelled at classifying valid narratives as valid, but experienced challenges at simultaneously classifying invalid narratives as invalid. Our findings not only advance the study of LLM capabilities, limitations, and validity but also offer practical insights for narrative generation and natural language processing applications.
翻译:大型语言模型(LLM)在生成大量叙事方面发挥着关键作用,这有助于系统性地探索其以叙事形式传达生活事件的有效性。在本研究中,我们采用零样本结构化叙事提示,利用OpenAI的GPT-4生成了24,000条叙事。从该数据集中,我们手动分类了2,880条叙事,并评估了它们在传达出生、死亡、雇佣和解雇事件方面的有效性。值得注意的是,87.43%的叙事充分传达了结构化提示的意图。为了自动识别有效和无效的叙事,我们在已分类的数据集上训练并验证了九个机器学习模型。利用这些模型,我们将分析扩展到预测剩余21,120条叙事的分类。所有机器学习模型在将有效叙事分类为有效方面表现出色,但在同时将无效叙事分类为无效方面遇到了挑战。我们的发现不仅推进了对LLM能力、局限性和有效性的研究,还为叙事生成和自然语言处理应用提供了实用的见解。