Large Language Models (LLMs) play a pivotal role in generating vast arrays of narratives, facilitating a systematic exploration of their effectiveness for communicating life events in narrative form. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 narratives using OpenAI's GPT-4. From this dataset, we manually classify 2,880 narratives and evaluate their validity in conveying birth, death, hiring, and firing events. Remarkably, 87.43% of the narratives sufficiently convey the intention of the structured prompt. To automate the identification of valid and invalid narratives, we train and validate nine Machine Learning models on the classified datasets. Leveraging these models, we extend our analysis to predict the classifications of the remaining 21,120 narratives. All the ML models excelled at classifying valid narratives as valid, but experienced challenges at simultaneously classifying invalid narratives as invalid. Our findings not only advance the study of LLM capabilities, limitations, and validity but also offer practical insights for narrative generation and natural language processing applications.
翻译:大型语言模型(LLMs)在生成大量叙述中扮演关键角色,为系统探索其以叙述形式传达生活事件的有效性提供了便利。本研究采用零样本结构化叙事提示,通过OpenAI的GPT-4生成24,000条叙述。从该数据集中,我们手动分类了2,880条叙述,并评估其在传达出生、死亡、雇佣和解雇事件方面的有效性。值得注意的是,87.43%的叙述充分传达了结构化提示的意图。为自动识别有效和无效叙述,我们在分类数据集上训练并验证了九个机器学习模型。借助这些模型,我们将分析扩展至预测剩余21,120条叙述的分类结果。所有机器学习模型在将有效叙述分类为有效方面表现出色,但在同时将无效叙述分类为无效时面临挑战。我们的发现不仅推进了对LLM能力、局限性和有效性的研究,还为叙述生成和自然语言处理应用提供了实用见解。