Humanitarian crises demand timely and accurate geographic information to inform effective response efforts. Yet, automated systems that extract locations from text often reproduce existing geographic and socioeconomic biases, leading to uneven visibility of crisis-affected regions. This paper investigates whether Large Language Models (LLMs) can address these geographic disparities in extracting location information from humanitarian documents. We introduce a two-step framework that combines few-shot LLM-based named entity recognition with an agent-based geocoding module that leverages context to resolve ambiguous toponyms. We benchmark our approach against state-of-the-art pretrained and rule-based systems using both accuracy and fairness metrics across geographic and socioeconomic dimensions. Our evaluation uses an extended version of the HumSet dataset with refined literal toponym annotations. Results show that LLM-based methods substantially improve both the precision and fairness of geolocation extraction from humanitarian texts, particularly for underrepresented regions. By bridging advances in LLM reasoning with principles of responsible and inclusive AI, this work contributes to more equitable geospatial data systems for humanitarian response, advancing the goal of leaving no place behind in crisis analytics.
翻译:人道主义危机需要及时准确的地理信息来指导有效的响应行动。然而,从文本中自动提取位置的系统往往会复制现有的地理和社会经济偏见,导致受危机影响地区的可见性不均。本文研究了大型语言模型(LLMs)能否在从人道主义文件中提取位置信息时解决这些地理差异。我们提出了一个两步框架,该框架结合了基于LLM的少样本命名实体识别与一个基于智能体的地理编码模块,该模块利用上下文来解析有歧义的地名。我们使用地理和社会经济维度上的准确性和公平性指标,将我们的方法与最先进的预训练和基于规则的系统进行了基准测试。我们的评估使用了HumSet数据集的扩展版本,该版本带有精细化的字面地名标注。结果表明,基于LLM的方法显著提高了从人道主义文本中提取地理位置的精确性和公平性,特别是对于代表性不足的地区。通过将LLM推理的进展与负责任、包容性人工智能的原则相结合,这项工作有助于为人道主义响应建立更公平的地理空间数据系统,推进在危机分析中"不落下任何地方"的目标。