Idiomatic expression (IE) processing and comprehension have challenged pre-trained language models (PTLMs) because their meanings are non-compositional. Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. This extends the established ATOMIC2020 graph, converting PTLMs into knowledge models (KMs) that encode and infer commonsense knowledge related to IE use. Experiments show that various PTLMs can be converted into KMs with IEKG. We verify the quality of IEKG and the ability of the trained KMs with automatic and human evaluation. Through applications in natural language understanding, we show that a PTLM injected with knowledge from IEKG exhibits improved IE comprehension ability and can generalize to IEs unseen during training.
翻译:习语表达(IE)的处理与理解对预训练语言模型(PTLMs)构成挑战,因其语义具有非组合性。与先前通过包含习语的句子微调PTLM来实现理解的工作不同,本文构建了IEKG——一个针对习语比喻性解释的常识知识图谱。该图谱扩展了既有的ATOMIC2020知识图谱,将PTLM转化为能够编码和推理与习语使用相关的常识知识的知识模型(KM)。实验表明,多种PTLM均可借助IEKG转化为知识模型。我们通过自动评估与人工评估验证了IEKG的质量及所训练知识模型的能力。在自然语言理解应用中的结果表明,注入IEKG知识的PTLM展现出更强的习语理解能力,并可泛化至训练中未见的习语表达。