What does it mean for a machine to recognize beauty? While beauty remains a culturally and experientially compelling but philosophically elusive concept, deep learning systems increasingly appear capable of modeling aesthetic judgment. In this paper, we explore the capacity of neural networks to represent beauty despite the immense formal diversity of objects for which the term applies. By drawing on recent work on cross-model representational convergence, we show how aesthetic content produces more similar and aligned representations between models which have been trained on distinct data and modalities - while unaesthetic images do not produce more aligned representations. This finding implies that the formal structure of beautiful images has a realist basis - rather than only as a reflection of socially constructed values. Furthermore, we propose that these realist representations exist because of a joint grounding of aesthetic form in physical and cultural substance. We argue that human perceptual and creative acts play a central role in shaping these the latent spaces of deep learning systems, but that a realist basis for aesthetics shows that machines are not mere creative parrots but can produce novel creative insights from the unique vantage point of scale. Our findings suggest that human-machine co-creation is not merely possible, but foundational - with beauty serving as a teleological attractor in both cultural production and machine perception.
翻译:机器识别美意味着什么?尽管美在文化和经验层面上引人入胜,但在哲学上仍是一个难以捉摸的概念,深度学习系统却日益显示出建模审美判断的能力。本文探讨了神经网络在“美”这一术语所适用的对象形式极其多样的情况下表征美的能力。通过借鉴近期关于跨模型表征收敛的研究,我们展示了审美内容如何在经过不同数据和模态训练的模型之间产生更相似且对齐的表征——而不具美感的图像则不会产生更对齐的表征。这一发现意味着,优美图像的形式结构具有实在论基础——而不仅仅是社会建构价值的反映。此外,我们提出,这些实在论表征之所以存在,是因为审美形式在物理实质与文化实质中具有共同的根基。我们认为,人类的感知与创造行为在塑造深度学习系统的这些潜在空间方面发挥着核心作用,但美学的实在论基础表明,机器并非仅仅是创造性的鹦鹉学舌者,它们能够从规模化的独特视角产生新颖的创造性见解。我们的研究结果表明,人机协同创作不仅是可能的,更是基础性的——美在文化生产和机器感知中充当着目的论的吸引子。