For monitoring crises, political events are extracted from the news. The large amount of unstructured full-text event descriptions makes a case-by-case analysis unmanageable, particularly for low-resource humanitarian aid organizations. This creates a demand to classify events into event types, a task referred to as event coding. Typically, domain experts craft an event type ontology, annotators label a large dataset and technical experts develop a supervised coding system. In this work, we propose PR-ENT, a new event coding approach that is more flexible and resource-efficient, while maintaining competitive accuracy: first, we extend an event description such as "Military injured two civilians'' by a template, e.g. "People were [Z]" and prompt a pre-trained (cloze) language model to fill the slot Z. Second, we select answer candidates Z* = {"injured'', "hurt"...} by treating the event description as premise and the filled templates as hypothesis in a textual entailment task. This allows domain experts to draft the codebook directly as labeled prompts and interpretable answer candidates. This human-in-the-loop process is guided by our interactive codebook design tool. We evaluate PR-ENT in several robustness checks: perturbing the event description and prompt template, restricting the vocabulary and removing contextual information.
翻译:为监测危机事件,需从新闻中提取政治事件。大量非结构化全文事件描述使得逐案分析难以管理,尤其对资源匮乏的人道主义援助组织而言。这催生了将事件分类为事件类型的需求,该任务称为事件编码。通常,领域专家构建事件类型本体,标注员标注大规模数据集,技术专家开发监督编码系统。本文提出PR-ENT,一种更灵活且资源高效的新型事件编码方法,同时保持具有竞争力的准确率:首先,我们通过模板扩展事件描述(如“军队打伤两名平民”被扩展为“人们被[Z]”),并提示预训练(完形填空)语言模型填充槽位Z。其次,我们将事件描述视为前提,填充后的模板视为假设,在文本蕴含任务中选择答案候选Z*={"打伤","伤害"……}。这使得领域专家能直接将编码手册设计为带标注的提示和可解释的答案候选。这一人机协同流程由我们的交互式编码手册设计工具引导。我们通过多项鲁棒性检验评估PR-ENT:扰动事件描述和提示模板、限制词汇表、移除上下文信息。