Ethology of Latent Spaces

from arxiv, 23. pages, 14 figures, presented Hyperheritage International Symposium 9 ( https://paragraphe.univ-paris8.fr/IMG/pdf/programme_colloque_his9_campuscondorcet_v3.pdf ) and accepted for publication in double-blind peer review in French in 2026-2027

This study challenges the presumed neutrality of latent spaces in vision language models (VLMs) by adopting an ethological perspective on their algorithmic behaviors. Rather than constituting spaces of homogeneous indeterminacy, latent spaces exhibit model-specific algorithmic sensitivities, understood as differential regimes of perceptual salience shaped by training data and architectural choices. Through a comparative analysis of three models (OpenAI CLIP, OpenCLIP LAION, SigLIP) applied to a corpus of 301 artworks (15th to 20th), we reveal substantial divergences in the attribution of political and cultural categories. Using bipolar semantic axes derived from vector analogies (Mikolov et al., 2013), we show that SigLIP classifies 59.4% of the artworks as politically engaged, compared to only 4% for OpenCLIP. African masks receive the highest political scores in SigLIP while remaining apolitical in OpenAI CLIP. On an aesthetic colonial axis, inter-model discrepancies reach 72.6 percentage points. We introduce three operational concepts: computational latent politicization, describing the emergence of political categories without intentional encoding; emergent bias, irreducible to statistical or normative bias and detectable only through contrastive analysis; and three algorithmic scopic regimes: entropic (LAION), institutional (OpenAI), and semiotic (SigLIP), which structure distinct modes of visibility. Drawing on Foucault's notion of the archive, Jameson's ideologeme, and Simondon's theory of individuation, we argue that training datasets function as quasi-archives whose discursive formations crystallize within latent space. This work contributes to a critical reassessment of the conditions under which VLMs are applied to digital art history and calls for methodologies that integrate learning architectures into any delegation of cultural interpretation to algorithmic agents.

翻译：本研究通过采用动物行为学的视角分析视觉语言模型（VLM）中潜在空间的算法行为，挑战了其预设的中立性。潜在空间并非构成同质的不确定性空间，而是表现出模型特定的算法敏感性，这种敏感性可理解为由训练数据和架构选择塑造的不同感知显著性机制。通过对三个模型（OpenAI CLIP、OpenCLIP LAION、SigLIP）应用于包含301件艺术作品（15至20世纪）的语料库进行比较分析，我们揭示了其在政治与文化类别归因上的显著分歧。利用从向量类比（Mikolov et al., 2013）导出的双极语义轴，我们发现SigLIP将59.4%的艺术作品归类为具有政治参与性，而OpenCLIP仅有4%。非洲面具在SigLIP中获得最高的政治评分，而在OpenAI CLIP中则保持非政治性。在美学殖民轴上，模型间的差异达到72.6个百分点。我们提出了三个操作概念：计算性潜在政治化，描述了政治类别在无意识编码下的涌现；涌现性偏差，无法简化为统计或规范偏差，仅能通过对比分析检测；以及三种算法视觉机制：熵化（LAION）、制度化（OpenAI）和符号化（SigLIP），它们构建了不同的可见性模式。借鉴福柯的档案概念、詹姆逊的意识形态素以及西蒙东的个体化理论，我们认为训练数据集发挥着准档案的功能，其话语形构在潜在空间中结晶。这项工作有助于批判性地重新评估将VLM应用于数字艺术史的条件，并呼吁在将文化解释权委托给算法代理时，需整合学习架构的方法论。