The same real-life questions posed to different individuals may lead to different answers based on their unique situations. For instance, whether a student is eligible for a scholarship depends on eligibility conditions, such as major or degree required. ConditionalQA was proposed to evaluate models' capability of reading a document and answering eligibility questions, considering unmentioned conditions. However, it is limited to questions on single documents, neglecting harder cases that may require cross-document reasoning and optimization, for example, "What is the maximum number of scholarships attainable?" Such questions over multiple documents are not only more challenging due to more context having to understand, but also because the model has to (1) explore all possible combinations of unmentioned conditions and (2) understand the relationship between conditions across documents, to reason about the optimal outcome. To evaluate models' capability of answering such questions, we propose a new dataset MDCR, which can reflect real-world challenges and serve as a new test bed for complex conditional reasoning that requires optimization. We evaluate this dataset using the most recent LLMs and demonstrate their limitations in solving this task. We believe this dataset will facilitate future research in answering optimization questions with unknown conditions.
翻译:向不同个体提出的相同现实问题,可能因其独特情境而导致不同答案。例如,学生是否符合奖学金资格取决于专业或学位要求等条件。ConditionalQA 的提出旨在评估模型在阅读文档并回答资格问题时的能力,同时考虑未明确陈述的条件。然而,该方法仅限于针对单文档的提问,忽略了可能需要跨文档推理与优化的更复杂情况,例如“可获得的奖学金最大数量是多少?”这类涉及多文档的问题不仅因需要理解更多上下文而更具挑战性,还要求模型必须(1)探索所有未提及条件的可能组合,并(2)理解跨文档条件之间的关系,从而推理出最优结果。为评估模型回答此类问题的能力,我们提出了新的数据集 MDCR,该数据集能反映实际挑战,并为需要优化的复杂条件推理提供新的测试平台。我们使用最新的 LLMs 对该数据集进行评估,并展示了它们在解决此任务时的局限性。我们相信该数据集将推动未来在未知条件下回答优化问题的研究。