Past work probing compositionality in sentence embedding models faces issues determining the causal impact of implicit syntax representations. Given a sentence, we construct a neural module net based on its syntax parse and train it end-to-end to approximate the sentence's embedding generated by a transformer model. The distillability of a transformer to a Syntactic NeurAl Module Net (SynNaMoN) then captures whether syntax is a strong causal model of its compositional ability. Furthermore, we address questions about the geometry of semantic composition by specifying individual SynNaMoN modules' internal architecture & linearity. We find differences in the distillability of various sentence embedding models that broadly correlate with their performance, but observe that distillability doesn't considerably vary by model size. We also present preliminary evidence that much syntax-guided composition in sentence embedding models is linear, and that non-linearities may serve primarily to handle non-compositional phrases.
翻译:过去关于句嵌入模型组合性的探究在确定隐式句法表示的因果影响方面面临挑战。针对给定句子,我们基于其句法解析构建神经模块网络,并采用端到端训练使其逼近Transformer模型生成的句子嵌入。Transformer向句法神经模块网络(SynNaMoN)的可蒸馏性由此捕获句法是否是其组合能力的强因果模型。此外,我们通过指定SynNaMoN各模块的内部架构与线性性质,探讨语义组合的几何学问题。研究发现,不同句嵌入模型的可蒸馏性差异与性能大致相关,但可蒸馏性并未随模型规模显著变化。我们还提供初步证据表明,句嵌入模型中多数语法引导的组合具有线性特征,而非线性成分可能主要用于处理非组合性短语。