Humanitarian Mine Action (HMA) addresses the challenge of detecting and removing landmines from conflict regions. Much of the life-saving operational knowledge produced by HMA agencies is buried in unstructured reports, limiting the transferability of information between agencies. To address this issue, we propose TextMineX: the first dataset, evaluation framework and ontology-guided large language model (LLM) pipeline for knowledge extraction from text in the HMA domain. TextMineX structures HMA reports into (subject, relation, object)-triples, thus creating domain-specific knowledge. To ensure real-world relevance, we utilized the dataset from our collaborator Cambodian Mine Action Centre (CMAC). We further introduce a bias-aware evaluation framework that combines human-annotated triples with an LLM-as-Judge protocol to mitigate position bias in reference-free scoring. Our experiments show that ontology-aligned prompts improve extraction accuracy by up to 44.2%, reduce hallucinations by 22.5%, and enhance format adherence by 20.9% compared to baseline models. We publicly release the dataset and code.
翻译:人道主义排雷行动致力于解决冲突地区地雷探测与清除的挑战。排雷机构产生的大量关键操作知识埋藏在非结构化报告中,限制了信息在机构间的可转移性。为解决此问题,我们提出TextMineX:首个面向HMA领域文本知识提取的数据集、评估框架及本体引导的大型语言模型流水线。TextMineX将HMA报告结构化表示为(主体,关系,客体)三元组,从而构建领域特定知识。为确保现实相关性,我们采用了合作方柬埔寨排雷行动中心的数据集。我们进一步提出一种偏差感知评估框架,该框架结合人工标注三元组与LLM-as-Judge协议,以缓解无参考评分中的位置偏差。实验表明,与基线模型相比,本体对齐提示将提取准确率最高提升44.2%,幻觉率降低22.5%,格式遵循度提高20.9%。我们公开发布了数据集与代码。