News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to understand named entities for news captioning. By ``understand'', we mean correlating the news content with common sense in the wild, which helps an agent to 1) distinguish semantically similar named entities and 2) describe named entities using words outside of training corpora. Our approach consists of three modules: (a) Filter Module aims to clarify the common sense concerning a named entity from two aspects: what does it mean? and what is it related to?, which divide the common sense into explanatory knowledge and relevant knowledge, respectively. (b) Distinguish Module aggregates explanatory knowledge from node-degree, dependency, and distinguish three aspects to distinguish semantically similar named entities. (c) Enrich Module attaches relevant knowledge to named entities to enrich the entity description by commonsense information (e.g., identity and social position). Finally, the probability distributions from both modules are integrated to generate the news captions. Extensive experiments on two challenging datasets (i.e., GoodNews and NYTimes) demonstrate the superiority of our method. Ablation studies and visualization further validate its effectiveness in understanding named entities.
翻译:新闻描述旨在以新闻文章正文为输入来描述图像,其高度依赖一组检测到的命名实体,包括现实世界中的人物、组织和地点。本文利用常识知识来理解命名实体,以用于新闻描述。通过“理解”,我们意指将新闻内容与开放世界中的常识关联起来,这有助于智能体:1) 区分语义相似的命名实体;2) 使用训练语料库之外的词汇来描述命名实体。我们的方法包含三个模块:(a) 过滤模块旨在从两个方面阐明关于命名实体的常识:它意味着什么?以及与什么相关?这分别将常识分为解释性知识和相关性知识。(b) 区分模块从节点度数、依赖关系和区分三个方面聚合解释性知识,以区分语义相似的命名实体。(c) 丰富模块将相关性知识附加到命名实体上,通过常识信息(例如身份和社会地位)来丰富实体描述。最终,来自两个模块的概率分布被整合以生成新闻描述。在两个具有挑战性的数据集(即GoodNews和NYTimes)上进行的大量实验证明了我们方法的优越性。消融研究和可视化进一步验证了其理解命名实体的有效性。