News captioning task aims to generate sentences by describing named entities or concrete events for an image with its news article. Existing methods have achieved remarkable results by relying on the large-scale pre-trained models, which primarily focus on the correlations between the input news content and the output predictions. However, the news captioning requires adhering to some fundamental rules of news reporting, such as accurately describing the individuals and actions associated with the event. In this paper, we propose the rule-driven news captioning method, which can generate image descriptions following designated rule signal. Specifically, we first design the news-aware semantic rule for the descriptions. This rule incorporates the primary action depicted in the image (e.g., "performing") and the roles played by named entities involved in the action (e.g., "Agent" and "Place"). Second, we inject this semantic rule into the large-scale pre-trained model, BART, with the prefix-tuning strategy, where multiple encoder layers are embedded with news-aware semantic rule. Finally, we can effectively guide BART to generate news sentences that comply with the designated rule. Extensive experiments on two widely used datasets (i.e., GoodNews and NYTimes800k) demonstrate the effectiveness of our method.
翻译:新闻描述任务旨在根据图像及其相关新闻文章,对命名实体或具体事件进行句子级描述。现有方法通过依赖大规模预训练模型取得了显著成果,这些方法主要关注输入新闻内容与输出预测之间的相关性。然而,新闻描述需要遵循新闻报道的基本规则,例如准确描述与事件相关的个人及行为。本文提出了一种基于规则的新闻描述生成方法,能够按照指定规则信号生成图像描述。具体而言,我们首先为描述设计了新闻感知语义规则,该规则融合了图像中呈现的主要动作(如"表演")以及参与动作的命名实体所扮演的角色(如"施动者"和"地点")。其次,我们采用前缀调优策略,将这种语义规则注入大规模预训练模型BART中,通过多个编码器层嵌入新闻感知语义规则。最后,我们能够有效引导BART生成符合指定规则的新闻句子。在GoodNews和NYTimes800k两个广泛使用的数据集上进行的大量实验表明了我们方法的有效性。