Large Language Models (LLMs) are increasingly used to generate and shape cultural content, ranging from narrative writing to artistic production. While these models demonstrate impressive fluency and generative capacity, prior work has shown that they also exhibit systematic cultural biases, raising concerns about stereotyping, homogenization, and the erasure of culturally specific forms of expression. Understanding whether LLMs can meaningfully align with diverse cultures beyond the dominant ones remains a critical challenge. In this paper, we study cultural adaptation in LLMs through the lens of cooking recipes, a domain in which culture, tradition, and creativity are tightly intertwined. We build on the \textit{GlobalFusion} dataset, which pairs human recipes from different countries according to established measures of cultural distance. Using the same country pairs, we generate culturally adapted recipes with multiple LLMs, enabling a direct comparison between human and LLM behavior in cross-cultural content creation. Our analysis shows that LLMs fail to produce culturally representative adaptations. Unlike humans, the divergence of their generated recipes does not correlate with cultural distance. We further provide explanations for this gap. We show that cultural information is weakly preserved in internal model representations, that models inflate novelty in their production by misunderstanding notions such as creativity and tradition, and that they fail to identify adaptation with its associated countries and to ground it in culturally salient elements such as ingredients. These findings highlight fundamental limitations of current LLMs for culturally oriented generation and have important implications for their use in culturally sensitive applications.
翻译:大型语言模型(LLMs)正日益被用于生成和塑造文化内容,涵盖从叙事写作到艺术创作的广泛领域。尽管这些模型展现出令人印象深刻的流畅性和生成能力,先前的研究表明它们也表现出系统性的文化偏见,引发了关于刻板印象、同质化以及特定文化表达形式被抹除的担忧。理解LLMs能否在主导文化之外与多元文化产生有意义的契合,仍然是一个关键挑战。本文通过烹饪食谱这一文化、传统与创造力紧密交织的领域,研究LLMs的文化适应性。我们基于\textit{GlobalFusion}数据集展开研究,该数据集根据既定的文化距离度量,对不同国家的人类食谱进行配对。利用相同的国家配对,我们使用多种LLMs生成文化适应性的食谱,从而能够直接比较人类与LLM在跨文化内容创作中的行为差异。我们的分析表明,LLMs未能产生具有文化代表性的适应性食谱。与人类不同,其生成食谱的差异性与文化距离无关。我们进一步解释了这一差距的原因:文化信息在模型内部表征中保存较弱;模型通过误解创造力、传统等概念而夸大了其产出的新颖性;并且它们未能将适应性与其关联国家对应起来,也未能将其锚定于食材等文化显著性元素中。这些发现凸显了当前LLMs在面向文化生成方面的根本性局限,对其在文化敏感应用中的使用具有重要启示。