This paper presents the participation of team PSL in the QIAS 2026 Shared Task on Arabic Islamic inheritance reasoning. The task evaluates the ability of large language models to solve inheritance cases that require legal interpretation, multi-step reasoning, and precise numerical computation. We compare \textit{commercial} and \textit{open-source} models under a unified prompting strategy to assess their effectiveness in structured legal reasoning with minimal task-specific adaptation. \\ Our results show a clear gap in reliability between the two model families. Commercial models demonstrate stronger performance in identifying eligible heirs, applying exclusion rules, and maintaining consistency across reasoning steps. In contrast, open-source models exhibit greater instability, particularly in cases involving dependent legal decisions and fractional share adjustments. The best performance is achieved by \textit{Gemini 2.5 Flash}, with an MRE of $0.989$.
翻译:本文介绍了PSL团队参与2026年阿拉伯伊斯兰继承推理评测任务(QIAS 2026 Shared Task)的情况。该任务旨在评估大语言模型解决需要法律解释、多步推理和精确数值计算的继承案例的能力。我们在统一提示策略下比较了商业模型和开源模型,以评估其在最小任务特定适配条件下处理结构化法律推理的有效性。研究结果显示两类模型在可靠性方面存在明显差距。商业模型在识别合格继承人、应用排除规则以及保持推理步骤一致性方面表现更强。相比之下,开源模型表现出更大的不稳定性,尤其是在涉及从属法律决策和分数份额调整的案例中。最佳性能由Gemini 2.5 Flash实现,其平均相对误差(MRE)为0.989。