MDCR: A Dataset for Multi-Document Conditional Reasoning

The same real-life questions posed to different individuals may lead to different answers based on their unique situations. For instance, whether a student is eligible for a scholarship depends on eligibility conditions, such as major or degree required. ConditionalQA was proposed to evaluate models' capability of reading a document and answering eligibility questions, considering unmentioned conditions. However, it is limited to questions on single documents, neglecting harder cases that may require cross-document reasoning and optimization, for example, "What is the maximum number of scholarships attainable?" Such questions over multiple documents are not only more challenging due to more context having to understand, but also because the model has to (1) explore all possible combinations of unmentioned conditions and (2) understand the relationship between conditions across documents, to reason about the optimal outcome. To evaluate models' capability of answering such questions, we propose a new dataset MDCR, which can reflect real-world challenges and serve as a new test bed for complex conditional reasoning that requires optimization. We evaluate this dataset using the most recent LLMs and demonstrate their limitations in solving this task. We believe this dataset will facilitate future research in answering optimization questions with unknown conditions.

翻译：向不同个体提出的相同现实问题，可能因其独特情境而导致不同答案。例如，学生是否符合奖学金资格取决于专业或学位要求等条件。ConditionalQA 的提出旨在评估模型在阅读文档并回答资格问题时的能力，同时考虑未明确陈述的条件。然而，该方法仅限于针对单文档的提问，忽略了可能需要跨文档推理与优化的更复杂情况，例如“可获得的奖学金最大数量是多少？”这类涉及多文档的问题不仅因需要理解更多上下文而更具挑战性，还要求模型必须（1）探索所有未提及条件的可能组合，并（2）理解跨文档条件之间的关系，从而推理出最优结果。为评估模型回答此类问题的能力，我们提出了新的数据集 MDCR，该数据集能反映实际挑战，并为需要优化的复杂条件推理提供新的测试平台。我们使用最新的 LLMs 对该数据集进行评估，并展示了它们在解决此任务时的局限性。我们相信该数据集将推动未来在未知条件下回答优化问题的研究。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日