We present Grammar-Based Grounded Lexicon Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts. At the core of G2L2 is a collection of lexicon entries, which map each word to a tuple of a syntactic type and a neuro-symbolic semantic program. For example, the word shiny has a syntactic type of adjective; its neuro-symbolic semantic program has the symbolic form {\lambda}x. filter(x, SHINY), where the concept SHINY is associated with a neural network embedding, which will be used to classify shiny objects. Given an input sentence, G2L2 first looks up the lexicon entries associated with each token. It then derives the meaning of the sentence as an executable neuro-symbolic program by composing lexical meanings based on syntax. The recovered meaning programs can be executed on grounded inputs. To facilitate learning in an exponentially-growing compositional space, we introduce a joint parsing and expected execution algorithm, which does local marginalization over derivations to reduce the training time. We evaluate G2L2 on two domains: visual reasoning and language-driven navigation. Results show that G2L2 can generalize from small amounts of data to novel compositions of words.
翻译:我们提出基于语法的具身词汇学习(G2L2),这是一种词汇主义方法,旨在从具身数据(如图像和文本配对数据)中学习语言的组合性及具身意义表征。G2L2的核心是一组词汇条目,每个条目将单词映射到一个句法类型与神经符号语义程序的元组。例如,单词"shiny"的句法类型为形容词,其神经符号语义程序具有符号形式λx.filter(x, SHINY),其中概念SHINY与用于分类发光物体的神经网络嵌入相关联。给定输入句子,G2L2首先查找每个标记对应的词汇条目,然后通过基于句法的词汇意义组合,推导出句子的可执行神经符号语义程序。恢复的意义程序可在具身输入上执行。为在指数增长的组合空间中促进学习,我们引入一种联合解析与期望执行算法,通过对推导进行局部边缘化来减少训练时间。我们在视觉推理和语言驱动导航两个领域评估G2L2。结果表明,G2L2能够从少量数据泛化到单词的新组合。