Despite recent interest in open domain question answering (ODQA) over tables, many studies still rely on datasets that are not truly optimal for the task with respect to utilizing structural nature of table. These datasets assume answers reside as a single cell value and do not necessitate exploring over multiple cells such as aggregation, comparison, and sorting. Thus, we release Open-WikiTable, the first ODQA dataset that requires complex reasoning over tables. Open-WikiTable is built upon WikiSQL and WikiTableQuestions to be applicable in the open-domain setting. As each question is coupled with both textual answers and SQL queries, Open-WikiTable opens up a wide range of possibilities for future research, as both reader and parser methods can be applied. The dataset and code are publicly available.
翻译:尽管近年来开放域表格问答(ODQA)引起了研究兴趣,但许多研究仍依赖于那些在利用表格结构特性方面并非真正最优的数据集。这些数据集假设答案存在于单个单元格值中,且无需探索跨多个单元格的聚合、比较与排序等操作。为此,我们发布了Open-WikiTable——首个需要基于表格进行复杂推理的ODQA数据集。Open-WikiTable基于WikiSQL和WikiTableQuestions构建,可适用于开放域场景。由于每个问题同时附有文本答案和SQL查询,该数据集为未来研究开辟了广泛可能性,使得阅读器方法与解析器方法均可应用。数据集及代码均已公开。