Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method

Analogical reasoning derives information from known relations and generalizes this information to similar yet unfamiliar situations. One of the first generalized ways in which deep learning models were able to solve verbal analogies was through vector arithmetic of word embeddings, essentially relating words that were mapped to a vector space (e.g., king - man + woman = __?). In comparison, most attempts to solve visual analogies are still predominantly task-specific and less generalizable. This project focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm. Taking the Abstraction and Reasoning Corpus (ARC) as an example to investigate visual analogy solving, we use a variational autoencoder (VAE) to transform ARC items into low-dimensional latent vectors, analogous to the word embeddings used in the verbal approaches. Through simple vector arithmetic, underlying rules of ARC items are discovered and used to solve them. Results indicate that the approach works well on simple items with fewer dimensions (i.e., few colors used, uniform shapes), similar input-to-output examples, and high reconstruction accuracy on the VAE. Predictions on more complex items showed stronger deviations from expected outputs, although, predictions still often approximated parts of the item's rule set. Error patterns indicated that the model works as intended. On the official ARC paradigm, the model achieved a score of 2% (cf. current world record is 21%) and on ConceptARC it scored 8.8%. Although the methodology proposed involves basic dimensionality reduction techniques and standard vector arithmetic, this approach demonstrates promising outcomes on ARC and can easily be generalized to other abstract visual reasoning tasks.

翻译：类比推理通过已知关系推导信息，并将这些信息泛化到相似但陌生的情境中。深度学习模型最早实现语言类比的通用方法之一是通过词嵌入的向量算术——本质上关联了映射到向量空间的词语（例如：国王-男人+女人=？）。相比之下，大多数视觉类比的求解尝试仍以任务特异性为主，缺乏泛化性。本研究聚焦视觉类比推理，将早期用于解决语言类比的通用机制迁移至视觉领域。以抽象推理语料库（ARC）为例探究视觉类比求解，我们采用变分自编码器（VAE）将ARC项目转换为低维潜在向量，类似于语言方法中使用的词嵌入。通过简单的向量算术，发现并求解ARC项目的潜在规则。结果表明：该方法在维度较少（如颜色使用少、形状统一）、输入输出示例相似且VAE重建精度高的简单项目上表现良好。复杂项目的预测结果与期望输出偏差较大，但预测结果仍常能近似项目规则集的组成部分。错误模式表明模型运行符合预期。在官方ARC基准测试中，该模型得分为2%（当前世界纪录为21%），在ConceptARC上得分为8.8%。尽管所提出的方法仅涉及基础降维技术与标准向量算术，但该方式在ARC上展现了可观的成果，并能轻松泛化至其他抽象视觉推理任务。