Transformer Grammars (TGs) enhance language modeling by incorporating syntactic tree structures. Despite the potentially significant impact on model performance of how syntactic trees are linearized in TGs, existing studies rely solely on Depth-First Traversal (DFT) for linearization. In this paper, we expand the traversal design space by exploring Breadth-First Traversal (BFT) and a novel hybrid traversal strategy, Production-Rule Traversal (PRT), which combines the structural lookahead of BFT with the early lexical generation of DFT. We integrate these traversal methods with varying tree configurations and masking strategies, and empirically evaluate their performance on language modeling, syntactic generalization and summarization. We reveal the inherent trade-offs between nested composition and global lookahead, providing actionable recommendations for designing task-aware Transformer Grammars.
翻译:Transformer语法(TGs)通过整合句法树结构增强了语言建模能力。尽管句法树在TGs中的线性化方式可能对模型性能产生重大影响,但现有研究仅依赖深度优先遍历(DFT)进行线性化。本文通过探索广度优先遍历(BFT)和一种新的混合遍历策略——产生式规则遍历(PRT),扩展了遍历设计空间。PRT融合了BFT的结构前瞻特性与DFT的早期词汇生成能力。我们将这些遍历方法与不同的树配置和掩码策略相结合,并在语言建模、句法泛化和摘要任务上进行了实证评估。研究揭示了嵌套组合与全局前瞻之间的固有权衡,为设计任务感知的Transformer语法提供了可操作的建议。