Current work on human-machine alignment aims at understanding machine-learned latent spaces and their correspondence to human representations. G{\"a}rdenfors' conceptual spaces is a prominent framework for understanding human representations. Convexity of object regions in conceptual spaces is argued to promote generalizability, few-shot learning, and interpersonal alignment. Based on these insights, we investigate the notion of convexity of concept regions in machine-learned latent spaces. We develop a set of tools for measuring convexity in sampled data and evaluate emergent convexity in layered representations of state-of-the-art deep networks. We show that convexity is robust to basic re-parametrization and, hence, meaningful as a quality of machine-learned latent spaces. We find that approximate convexity is pervasive in neural representations in multiple application domains, including models of images, audio, human activity, text, and medical images. Generally, we observe that fine-tuning increases the convexity of label regions. We find evidence that pretraining convexity of class label regions predicts subsequent fine-tuning performance.
翻译:当前人机对齐研究旨在理解机器学习潜空间及其与人类表征的对应关系。Gärdenfors的概念空间是理解人类表征的重要框架。概念空间中对象区域的凸性被认为能促进泛化性、少样本学习和人际对齐。基于这些见解,我们研究了机器学习潜空间中概念区域的凸性概念。我们开发了一套用于采样数据凸性测量的工具,并评估了当前最优深度网络分层表征中涌现的凸性。研究表明,凸性对基本重参数化具有鲁棒性,因此可作为机器学习潜空间的有效性质。我们发现,近似凸性普遍存在于多个应用领域的神经表征中,包括图像、音频、人体活动、文本及医学图像模型。总体而言,我们观察到微调会提升标签区域的凸性。实验证据表明,预训练阶段类别标签区域的凸性可预测后续微调性能。