This paper explores the influence of external knowledge integration in Natural Language Generation (NLG), focusing on a commonsense generation task. We extend the CommonGen dataset by creating KITGI, a benchmark that pairs input concept sets with retrieved semantic relations from ConceptNet and includes manually annotated outputs. Using the T5-Large model, we compare sentence generation under two conditions: with full external knowledge and with filtered knowledge where highly relevant relations were deliberately removed. Our interpretability benchmark follows a three-stage method: (1) identifying and removing key knowledge, (2) regenerating sentences, and (3) manually assessing outputs for commonsense plausibility and concept coverage. Results show that sentences generated with full knowledge achieved 91\% correctness across both criteria, while filtering reduced performance drastically to 6\%. These findings demonstrate that relevant external knowledge is critical for maintaining both coherence and concept coverage in NLG. This work highlights the importance of designing interpretable, knowledge-enhanced NLG systems and calls for evaluation frameworks that capture the underlying reasoning beyond surface-level metrics.
翻译:本文探讨了外部知识整合在自然语言生成(NLG)中的影响,聚焦于常识生成任务。我们通过创建KITGI基准扩展了CommonGen数据集,该基准将输入概念集与从ConceptNet检索的语义关系配对,并包含手动标注的输出。使用T5-Large模型,我们比较了两种条件下的句子生成:使用完整外部知识,以及使用经过过滤的知识(其中高度相关的关系被刻意移除)。我们的可解释性基准遵循三阶段方法:(1)识别并移除关键知识,(2)重新生成句子,(3)手动评估输出的常识合理性与概念覆盖度。结果显示,使用完整知识生成的句子在两项标准上均达到91%的正确率,而过滤知识后性能急剧下降至6%。这些发现表明,相关外部知识对于维持NLG的连贯性和概念覆盖度至关重要。本研究强调了设计可解释、知识增强的NLG系统的重要性,并呼吁建立超越表层指标、能够捕捉底层推理过程的评估框架。