A common assumption in representation learning is that globally well-distributed embeddings support robust and generalizable representations. This focus has shaped both training objectives and evaluation protocols, implicitly treating global geometry as a proxy for representational competence. While global geometry effectively encodes which elements are present, it is often insensitive to how they are composed. We investigate this limitation by testing the ability of geometric metrics to predict compositional binding across 21 vision encoders. We find that standard geometry-based statistics exhibit near-zero correlation with compositional binding. In contrast, functional sensitivity, as measured by the input-output Jacobian, reliably tracks this capability. We further provide an analytic account showing that this disparity arises from objective design, as existing losses explicitly constrain embedding geometry but leave the local input-output mapping unconstrained. These results suggest that global embedding geometry captures only a partial view of representational competence and establish functional sensitivity as a critical complementary axis for modeling composite structure.
翻译:表征学习中的一个常见假设是,全局分布均匀的嵌入能够支撑鲁棒且可泛化的表示。这一关注点同时影响了训练目标和评估方案,隐含地将全局几何视为表征能力的代理。虽然全局几何能有效编码存在哪些元素,但它通常对元素的组合方式不敏感。我们通过测试几何度量在21个视觉编码器中预测组合绑定能力,来探究这一局限性。研究发现,基于标准几何的统计量与组合绑定能力呈现近乎零的相关性。相比之下,通过输入-输出雅可比矩阵度量的功能敏感性则能可靠地追踪该能力。我们进一步通过理论分析表明,这种差异源于目标函数的设计:现有损失函数明确约束了嵌入几何,却未对局部输入-输出映射施加约束。这些结果表明,全局嵌入几何仅能捕捉表征能力的局部特征,从而确立了功能敏感性作为建模组合结构的关键补充维度。