Story generation and understanding -- as with all NLG/NLU tasks -- has seen a surge in neurosymbolic work. Researchers have recognized that, while large language models (LLMs) have tremendous utility, they can be augmented with symbolic means to be even better and to make up for any flaws that the neural networks might have. However, symbolic methods are extremely costly in terms of the amount of time and expertise needed to create them. In this work, we capitalize on state-of-the-art Code-LLMs, such as Codex, to bootstrap the use of symbolic methods for tracking the state of stories and aiding in story understanding. We show that our CoRRPUS system and abstracted prompting procedures can beat current state-of-the-art structured LLM techniques on pre-existing story understanding tasks (bAbI Task 2 and Re^3) with minimal hand engineering. We hope that this work can help highlight the importance of symbolic representations and specialized prompting for LLMs as these models require some guidance for performing reasoning tasks properly.
翻译:故事生成与理解——如同所有自然语言生成/理解任务一样——在神经符号研究领域呈现激增态势。研究者已认识到,尽管大型语言模型具有巨大实用价值,但通过符号化手段对其进行增强,既能进一步提升性能,又可弥补神经网络可能存在的缺陷。然而,符号化方法在创建所需的时间和专业知识方面成本极高。本项研究中,我们利用Codex等最先进的代码型语言模型,推动符号化方法在故事状态追踪及故事理解辅助中的应用。实验证明,我们的CoRRPUS系统及抽象化提示流程,能以最少的人工工程投入,在现有故事理解任务(bAbI任务2和Re³)上超越当前最先进的结构化语言模型技术。我们期望本研究能凸显符号化表示与专业化提示对语言模型的重要性——这些模型在执行推理任务时确实需要适度引导。