Extracting Victim Counts from Text

Decision-makers in the humanitarian sector rely on timely and exact information during crisis events. Knowing how many civilians were injured during an earthquake is vital to allocate aids properly. Information about such victim counts is often only available within full-text event descriptions from newspapers and other reports. Extracting numbers from text is challenging: numbers have different formats and may require numeric reasoning. This renders purely string matching-based approaches insufficient. As a consequence, fine-grained counts of injured, displaced, or abused victims beyond fatalities are often not extracted and remain unseen. We cast victim count extraction as a question answering (QA) task with a regression or classification objective. We compare regex, dependency parsing, semantic role labeling-based approaches, and advanced text-to-text models. Beyond model accuracy, we analyze extraction reliability and robustness which are key for this sensitive task. In particular, we discuss model calibration and investigate few-shot and out-of-distribution performance. Ultimately, we make a comprehensive recommendation on which model to select for different desiderata and data domains. Our work is among the first to apply numeracy-focused large language models in a real-world use case with a positive impact.

翻译：人道主义领域的决策者在危机事件期间依赖及时且准确的信息。了解地震中有多少平民受伤对于合理分配援助至关重要。这类受害者数量的信息往往只能从报纸及其他报道的全文事件描述中获得。从文本中提取数字具有挑战性：数字存在不同格式，并可能需要数值推理。这使得纯字符串匹配方法不足以胜任。因此，除死亡人数外，受伤、流离失所或遭受虐待的受害者的细粒度统计往往无法被提取而未被察觉。我们将受害者数量提取视为具有回归或分类目标的问题回答任务。我们比较了正则表达式、依存句法分析、基于语义角色标注的方法以及先进的文本到文本模型。除了模型准确性外，我们分析了提取的可靠性和鲁棒性——这些对于这一敏感任务至关重要。特别是，我们讨论了模型校准，并调查了少样本和分布外性能。最终，我们针对不同需求和数据领域给出了选择模型的综合建议。我们的工作是首批将专注于数值理解的大语言模型应用于具有积极影响的真实世界用例之一。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/