Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models

As Text-to-Image (TTI) diffusion models become increasingly influential in content creation, growing attention is being directed toward their societal and cultural implications. While prior research has primarily examined demographic and cultural biases, the ability of these models to accurately represent historical contexts remains largely underexplored. To address this gap, we introduce a benchmark for evaluating how TTI models depict historical contexts. The benchmark combines HistVis, a dataset of 30,000 synthetic images generated by three state-of-the-art diffusion models from carefully designed prompts covering universal human activities across multiple historical periods, with a reproducible evaluation protocol. We evaluate generated imagery across three key aspects: (1) Implicit Stylistic Associations: examining default visual styles associated with specific eras; (2) Historical Consistency: identifying anachronisms such as modern artifacts in pre-modern contexts; and (3) Demographic Representation: comparing generated racial and gender distributions against historically plausible baselines. Our findings reveal systematic inaccuracies in historically themed generated imagery, as TTI models frequently stereotype past eras by incorporating unstated stylistic cues, introduce anachronisms, and fail to reflect plausible demographic patterns. By providing a reproducible benchmark for historical representation in generated imagery, this work provides an initial step toward building more historically accurate TTI models.

翻译：随着文本到图像（TTI）扩散模型在内容创作中的影响力日益增强，其社会与文化影响正受到越来越多的关注。尽管先前研究主要考察了人口统计与文化偏见，但这些模型准确表征历史语境的能力在很大程度上仍未得到充分探索。为填补这一空白，我们引入了一个评估TTI模型如何描绘历史语境的基准。该基准结合了HistVis——一个包含30,000张合成图像的数据集，这些图像由三种先进的扩散模型根据精心设计的提示生成，涵盖了多个历史时期中普遍的人类活动——以及一套可复现的评估方案。我们从三个关键方面评估生成的图像：（1）隐式风格关联：考察与特定时代相关联的默认视觉风格；（2）历史一致性：识别时代错置现象，例如前现代语境中出现现代物品；（3）人口表征：将生成的种族与性别分布与历史上合理的基线进行比较。我们的研究结果表明，在历史主题的生成图像中存在系统性的不准确之处，因为TTI模型经常通过融入未言明的风格线索来刻板化过去时代，引入时代错置，且未能反映合理的人口分布模式。通过为生成图像中的历史表征提供一个可复现的基准，这项工作为构建更具历史准确性的TTI模型迈出了第一步。