We investigate the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. By systematically decorrelating provability from spurious features and enforcing algorithmic alignment, we find that in sufficiently deep models with a bidirectional prefix mask, implicit reasoning approaches explicit CoT performance across graph topologies and problem widths, though CoT remains necessary for depth extrapolation.
翻译:我们研究了有界深度Transformer中对霍恩子句进行隐式演绎推理的缩放特性。通过系统性地将可证明性与伪特征解耦并强制执行算法对齐,我们发现,在具有双向前缀掩码的足够深度模型中,隐式推理在各类图拓扑结构和问题宽度上均能达到与显式思维链相当的性能,尽管在深度外推场景下思维链仍不可或缺。