Recent advances in large language models (LLMs) enable compelling story generation, but connecting narrative text to playable visual environments remains an open challenge in procedural content generation (PCG). We present a lightweight pipeline that transforms short narrative prompts into a sequence of 2D tile-based game scenes, reflecting the temporal structure of stories. Given an LLM-generated narrative, our system identifies three key time frames, extracts spatial predicates in the form of "Object-Relation-Object" triples, and retrieves visual assets using affordance-aware semantic embeddings from the GameTileNet dataset. A layered terrain is generated using Cellular Automata, and objects are placed using spatial rules grounded in the predicate structure. We evaluated our system in ten diverse stories, analyzing tile-object matching, affordance-layer alignment, and spatial constraint satisfaction across frames. This prototype offers a scalable approach to narrative-driven scene generation and lays the foundation for future work on multi-frame continuity, symbolic tracking, and multi-agent coordination in story-centered PCG.
翻译:大型语言模型(LLM)的最新进展使得引人入胜的故事生成成为可能,但将叙事文本连接到可游玩的视觉环境仍然是程序化内容生成(PCG)中的一个开放挑战。我们提出了一种轻量级流程,将简短叙事提示转化为一系列基于二维瓦片的游戏场景,以反映故事的时间结构。给定一个LLM生成的叙事,我们的系统识别三个关键时间帧,以“对象-关系-对象”三元组的形式提取空间谓词,并利用来自GameTileNet数据集的可供性感知语义嵌入来检索视觉资源。通过使用元胞自动机生成分层地形,并基于谓词结构中的空间规则放置对象。我们在十个不同故事中评估了我们的系统,分析了跨时间帧的瓦片-对象匹配、可供性层对齐以及空间约束满足情况。该原型为叙事驱动的场景生成提供了一种可扩展的方法,并为未来在故事中心PCG中研究多帧连续性、符号追踪和多智能体协调奠定了基础。