Decision-makers in the humanitarian sector rely on timely and exact information during crisis events. Knowing how many civilians were injured during an earthquake is vital to allocate aids properly. Information about such victim counts is often only available within full-text event descriptions from newspapers and other reports. Extracting numbers from text is challenging: numbers have different formats and may require numeric reasoning. This renders purely string matching-based approaches insufficient. As a consequence, fine-grained counts of injured, displaced, or abused victims beyond fatalities are often not extracted and remain unseen. We cast victim count extraction as a question answering (QA) task with a regression or classification objective. We compare regex, dependency parsing, semantic role labeling-based approaches, and advanced text-to-text models. Beyond model accuracy, we analyze extraction reliability and robustness which are key for this sensitive task. In particular, we discuss model calibration and investigate few-shot and out-of-distribution performance. Ultimately, we make a comprehensive recommendation on which model to select for different desiderata and data domains. Our work is among the first to apply numeracy-focused large language models in a real-world use case with a positive impact.
翻译:人道主义领域的决策者在危机事件期间依赖及时且准确的信息。了解地震中有多少平民受伤对于合理分配援助至关重要。这类受害者数量的信息往往只能从报纸及其他报道的全文事件描述中获得。从文本中提取数字具有挑战性:数字存在不同格式,并可能需要数值推理。这使得纯字符串匹配方法不足以胜任。因此,除死亡人数外,受伤、流离失所或遭受虐待的受害者的细粒度统计往往无法被提取而未被察觉。我们将受害者数量提取视为具有回归或分类目标的问题回答任务。我们比较了正则表达式、依存句法分析、基于语义角色标注的方法以及先进的文本到文本模型。除了模型准确性外,我们分析了提取的可靠性和鲁棒性——这些对于这一敏感任务至关重要。特别是,我们讨论了模型校准,并调查了少样本和分布外性能。最终,我们针对不同需求和数据领域给出了选择模型的综合建议。我们的工作是首批将专注于数值理解的大语言模型应用于具有积极影响的真实世界用例之一。