Compositional generalization is crucial for artificial intelligence agents to solve complex vision-language reasoning tasks. Neuro-symbolic approaches have demonstrated promise in capturing compositional structures, but they face critical challenges: (a) reliance on predefined predicates for symbolic representations that limit adaptability, (b) difficulty in extracting predicates from raw data, and (c) using non-differentiable operations for combining primitive concepts. To address these issues, we propose NeSyCoCo, a neuro-symbolic framework that leverages large language models (LLMs) to generate symbolic representations and map them to differentiable neural computations. NeSyCoCo introduces three innovations: (a) augmenting natural language inputs with dependency structures to enhance the alignment with symbolic representations, (b) employing distributed word representations to link diverse, linguistically motivated logical predicates to neural modules, and (c) using the soft composition of normalized predicate scores to align symbolic and differentiable reasoning. Our framework achieves state-of-the-art results on the ReaSCAN and CLEVR-CoGenT compositional generalization benchmarks and demonstrates robust performance with novel concepts in the CLEVR-SYN benchmark.
翻译:组合泛化对于人工智能代理解决复杂的视觉-语言推理任务至关重要。神经符号方法在捕捉组合结构方面已展现出潜力,但仍面临关键挑战:(a) 符号表示依赖预定义谓词,限制了适应性;(b) 从原始数据中提取谓词存在困难;(c) 组合原始概念时使用不可微操作。为解决这些问题,我们提出了NeSyCoCo,一个利用大语言模型生成符号表示并将其映射到可微神经计算的神经符号框架。NeSyCoCo引入了三项创新:(a) 通过依存结构增强自然语言输入,以提升与符号表示的对齐度;(b) 采用分布式词表示将多样化的、基于语言学的逻辑谓词与神经模块相连接;(c) 使用归一化谓词得分的软组合来对齐符号推理与可微推理。我们的框架在ReaSCAN和CLEVR-CoGenT组合泛化基准测试中取得了最先进的结果,并在CLEVR-SYN基准测试中展示了处理新概念的鲁棒性能。