Media houses reporting on public figures, often come with their own biases stemming from their respective worldviews. A characterization of these underlying patterns helps us in better understanding and interpreting news stories. For this, we need diverse or subjective summarizations, which may not be amenable for classifying into predefined class labels. This work proposes a zero-shot approach for non-extractive or generative characterizations of person entities from a corpus using GPT-2. We use well-articulated articles from several well-known news media houses as a corpus to build a sound argument for this approach. First, we fine-tune a GPT-2 pre-trained language model with a corpus where specific person entities are characterized. Second, we further fine-tune this with demonstrations of person entity characterizations, created from a corpus of programmatically constructed characterizations. This twice fine-tuned model is primed with manual prompts consisting of entity names that were not previously encountered in the second fine-tuning, to generate a simple sentence about the entity. The results were encouraging, when compared against actual characterizations from the corpus.
翻译:报道公众人物的媒体机构往往带有基于其各自世界观的偏见。对这些潜在模式的刻画有助于我们更好地理解和解读新闻故事。为此,我们需要多样化或主观的摘要,而这些摘要可能不适用于预定义的类别标签。本文提出一种基于GPT-2的零样本方法,用于从语料库中对人物实体进行非抽取式或生成式的刻画。我们使用多家知名新闻媒体机构精心撰写的文章作为语料库,为该方法构建有力论证。首先,我们基于一个包含特定人物实体刻画的语料库,对预训练的GPT-2语言模型进行微调。其次,我们使用从程序化构建的刻画语料库中创建的人物实体刻画示例,进一步微调该模型。通过由第二微调阶段中未见过的人物名称构成的人工提示词,对经过两次微调的模型进行提示,生成关于该实体的简单句子。与语料库中的真实刻画相比,结果令人鼓舞。