This paper examines the evolving nature of internal representations in generative visual models, focusing on the conceptual and technical shift from GANs and VAEs to diffusion-based architectures. Drawing on Beatrice Fazi's account of synthesis as the amalgamation of distributed representations, we propose a distinction between "synthesis in a strict sense", where a compact latent space wholly determines the generative process, and "synthesis in a broad sense," which characterizes models whose representational labor is distributed across layers. Through close readings of model architectures and a targeted experimental setup that intervenes in layerwise representations, we show how diffusion models fragment the burden of representation and thereby challenge assumptions of unified internal space. By situating these findings within media theoretical frameworks and critically engaging with metaphors such as the latent space and the Platonic Representation Hypothesis, we argue for a reorientation of how generative AI is understood: not as a direct synthesis of content, but as an emergent configuration of specialized processes.
翻译:本文考察了生成式视觉模型中内部表征的演变本质,聚焦于从GANs和VAEs到基于扩散的架构在概念与技术上的转变。借鉴Beatrice Fazi将合成视为分布式表征融合的论述,我们提出了“严格意义上的合成”(即紧凑潜在空间完全决定生成过程)与“广义上的合成”(即表征任务分布于各层间的模型)的区分。通过对模型架构的细致解读以及干预分层表征的针对性实验设计,我们展示了扩散模型如何分散表征负担,从而挑战了统一内部空间的假设。通过将这些发现置于媒介理论框架中,并对潜在空间与柏拉图式表征假说等隐喻进行批判性探讨,我们主张重新定位对生成式AI的理解:它并非内容的直接合成,而是专业化过程涌现的构型。