Transforming a dense, abstract proverb into an engaging and morally faithful narrative requires deep cultural understanding and robust semantic grounding. We frame this problem as a \emph{constrained semantic decompression} task and study proverb-conditioned story generation as a testbed for abstraction-to-realization in large language models (LLMs). Focusing on Persian, we introduce the Proverb Aligned Narrative Dataset (PAND), pairing proverbs with human-written stories and explicit meanings. By a hybrid evaluation framework that combines human-calibrated LLM-as-a-Judge with structural metrics, we analyze model behavior across multiple prompting regimes. Our findings reveal a persistent \emph{decompression gap}: current LLMs often achieve strong surface-level fluency while failing to faithfully instantiate the underlying moral and causal structure encoded in proverbs. We further show that explicit reasoning and iterative refinement can partially mitigate these failures, suggesting that many decompression errors arise from difficulties in translating abstract meaning into narrative form rather than a complete lack of relevant knowledge. Our proposed task naturally extends to other forms of compressed cultural knowledge.
翻译:将浓缩的抽象谚语转化为富有吸引力且符合道德寓意的叙事,需要深刻的文化理解与扎实的语义基础。本文将这一问题建模为"约束性语义解压缩"任务,并以谚语条件故事生成为试验场,研究大型语言模型(LLMs)从抽象到具象的实现能力。聚焦波斯语场景,我们提出了谚语对齐叙事数据集(PAND),该数据集包含配对的人类创作故事及其明确寓意。通过结合人类校准的"LLM作为评判者"与结构性指标的混合评估框架,我们分析了多种提示策略下模型的行为表现。研究发现存在持续的"解压缩鸿沟":现有LLMs虽能实现较强的表层流畅性,却无法忠实具象化谚语中隐含的道德因果结构。进一步实验表明,显式推理与迭代修正可部分缓解此类缺陷,这揭示了解压缩错误主要源于模型在将抽象意义转化为叙事形式时的困难,而非缺乏相关知识。本文提出的任务可自然扩展至其他形式的压缩文化知识场景。