Explaining Artificial Intelligence (AI) decisions is a major challenge nowadays in AI, in particular when applied to sensitive scenarios like medicine and law. However, the need to explain the rationale behind decisions is a main issue also for human-based deliberation as it is important to justify \textit{why} a certain decision has been taken. Resident medical doctors for instance are required not only to provide a (possibly correct) diagnosis, but also to explain how they reached a certain conclusion. Developing new tools to aid residents to train their explanation skills is therefore a central objective of AI in education. In this paper, we follow this direction, and we present, to the best of our knowledge, the first multilingual dataset for Medical Question Answering where correct and incorrect diagnoses for a clinical case are enriched with a natural language explanation written by doctors. These explanations have been manually annotated with argument components (i.e., premise, claim) and argument relations (i.e., attack, support), resulting in the Multilingual CasiMedicos-Arg dataset which consists of 558 clinical cases in four languages (English, Spanish, French, Italian) with explanations, where we annotated 5021 claims, 2313 premises, 2431 support relations, and 1106 attack relations. We conclude by showing how competitive baselines perform over this challenging dataset for the argument mining task.
翻译:解释人工智能(AI)决策是当前AI领域的一项重大挑战,尤其是在应用于医学和法律等敏感场景时。然而,阐明决策背后的理由也是人类审议过程中的一个核心问题,因为证明为何做出某项决策至关重要。例如,住院医师不仅需要提供(可能正确的)诊断,还必须解释他们是如何得出特定结论的。因此,开发新工具以帮助住院医师训练其解释能力,是AI在教育领域的一个核心目标。本文沿此方向,据我们所知,首次提出了一个多语言医学问答数据集,其中临床案例的正确与错误诊断均附有医生撰写的自然语言解释。这些解释已通过人工标注了论证成分(即前提、主张)和论证关系(即攻击、支持),从而形成了多语言CasiMedicos-Arg数据集。该数据集包含558个临床案例,涵盖四种语言(英语、西班牙语、法语、意大利语)及相应解释,我们共标注了5021个主张、2313个前提、2431个支持关系和1106个攻击关系。最后,我们展示了在该具有挑战性的论证挖掘任务数据集上,竞争性基线模型的性能表现。