Relational reasoning refers to the ability to infer and understand the relations between multiple entities. In humans, this ability underpins many higher cognitive functions, such as problem solving and decision-making, and has been reliably linked to fluid intelligence. Despite machine learning models making impressive advances across various domains, such as natural language processing and vision, the extent to which such models can perform relational reasoning tasks remains unclear. Here we study the importance of positional encoding (PE) for relational reasoning in the Transformer, and find that a learnable PE outperforms all other commonly-used PEs (e.g., absolute, relative, rotary, etc.). Moreover, we find that when using a PE with a learnable parameter, the choice of initialization greatly influences the learned representations and its downstream generalization performance. Specifically, we find that a learned PE initialized from a small-norm distribution can 1) uncover ground-truth position information, 2) generalize in the presence of noisy inputs, and 3) produce behavioral patterns that are consistent with human performance. Our results shed light on the importance of learning high-performing and robust PEs during relational reasoning tasks, which will prove useful for tasks in which ground truth positions are not provided or not known.
翻译:关系推理指的是推断和理解多个实体之间关系的能力。在人类认知中,这种能力支撑着许多高级认知功能(如问题解决和决策制定),并与流体智力存在可靠关联。尽管机器学习模型在自然语言处理和视觉等多个领域取得了显著进展,但此类模型执行关系推理任务的能力程度仍不明确。本研究探讨了Transformer中位置编码对关系推理的重要性,发现可学习位置编码优于所有其他常用位置编码(如绝对位置编码、相对位置编码、旋转位置编码等)。此外,我们发现当使用含可学习参数的位置编码时,初始化方式的选择会显著影响学习到的表征及其下游泛化性能。具体而言,我们发现从小范数分布初始化的可学习位置编码能够:1)揭示真实位置信息;2)在噪声输入存在时保持泛化能力;3)产生与人类表现一致的行为模式。我们的研究结果揭示了在关系推理任务中学习高性能鲁棒位置编码的重要性,这对于未提供或未知真实位置信息的任务具有重要应用价值。