This paper presents a comprehensive overview of the QIAS 2026 shared task, organized as part of the OSACT7 Workshop and co-located with LREC 2026. The shared task was designed to evaluate the ability of large language models to perform complex reasoning in the religious and legal domain of Islamic inheritance. Unlike conventional question-answering benchmarks, QIAS 2026 focuses on end-to-end reasoning from natural language cases, requiring systems to perform the full inheritance calculation process, from identifying the eligible heirs to assigning the correct share to each beneficiary. To support this evaluation, the task was based on the MAWARITH benchmark, a dataset of $12{,}500$ Arabic inheritance cases annotated with intermediate reasoning steps and final answers. System submissions were evaluated using MIR-E, a multi-step metric that measures performance across the main stages of inheritance reasoning. A total of $16$ teams participated in the shared task, investigating a range of approaches, including prompting-based methods, retrieval-augmented generation, and fine-tuning strategies. The results show that Islamic inheritance remains a highly challenging benchmark for current language models, especially in stages that require precise legal interpretation and structured numerical reasoning. This overview summarizes the task design, dataset, evaluation framework, participating systems, and main results.
翻译:本文全面介绍了作为OSACT7研讨会一部分、与LREC 2026联合举办的QIAS 2026共享任务。该共享任务旨在评估大语言模型在伊斯兰继承这一宗教与法律领域中进行复杂推理的能力。与传统的问答基准不同,QIAS 2026聚焦于从自然语言案例中进行端到端推理,要求系统执行完整的继承计算流程,从识别合格继承人到为每位受益人分配正确份额。为支持此评估,该任务基于MAWARITH基准——一个包含12,500个阿拉伯语继承案例、附有中间推理步骤与最终答案的数据集。系统提交成果采用MIR-E度量标准进行评估,该标准通过多步骤指标衡量继承推理各主要阶段的性能。共有16支团队参与该共享任务,研究了一系列方法,包括基于提示的方法、检索增强生成以及微调策略。结果显示,伊斯兰继承对于当前语言模型仍具极高挑战性,尤其是在需要精准法律解释与结构化数值推理的阶段。本概述总结了任务设计、数据集、评估框架、参与系统及主要结果。