QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations - 专知论文

会员服务 ·

0

情景 · INFORMS · 数据集 · entity · 约束 ·

2023 年 5 月 19 日

QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set Operations

翻译：QUEST：带有隐式集合操作的实体查询检索数据集

Chaitanya Malaviya,Peter Shaw,Ming-Wei Chang,Kenton Lee,Kristina Toutanova

from arxiv, ACL 2023; Dataset available at https://github.com/google-research/language/tree/master/language/quest

Formulating selective information needs results in queries that implicitly specify set operations, such as intersection, union, and difference. For instance, one might search for "shorebirds that are not sandpipers" or "science-fiction films shot in England". To study the ability of retrieval systems to meet such information needs, we construct QUEST, a dataset of 3357 natural language queries with implicit set operations, that map to a set of entities corresponding to Wikipedia documents. The dataset challenges models to match multiple constraints mentioned in queries with corresponding evidence in documents and correctly perform various set operations. The dataset is constructed semi-automatically using Wikipedia category names. Queries are automatically composed from individual categories, then paraphrased and further validated for naturalness and fluency by crowdworkers. Crowdworkers also assess the relevance of entities based on their documents and highlight attribution of query constraints to spans of document text. We analyze several modern retrieval systems, finding that they often struggle on such queries. Queries involving negation and conjunction are particularly challenging and systems are further challenged with combinations of these operations.

翻译：选择性信息需求的表述会生成隐式指定集合操作（如交集、并集和差集）的查询。例如，用户可能搜索"不是矶鹬的滨鸟"或"在英国拍摄的科幻电影"。为研究检索系统满足此类信息需求的能力，我们构建了QUEST数据集——包含3357条带隐式集合操作的自然语言查询，每条查询对应维基百科文档中的一组实体。该数据集要求模型匹配查询中提及的多重约束条件与文档中的对应证据，并正确执行各类集合操作。数据集采用维基百科类别名称进行半自动构建：查询由单个类别自动组合而成，经人工众包改写及自然性、流畅性验证，同时由众包工作者根据文档评估实体相关性，并标注查询约束条件在文档文本中的归属。通过对多个现代检索系统的分析，我们发现这些系统在此类查询上表现欠佳。涉及否定与合取的查询尤为困难，而多种操作的组合进一步加剧了系统的处理难度。

0

相关内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

专知

15+阅读 · 2018年6月15日

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

机器学习研究会

50+阅读 · 2018年2月21日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

25+阅读 · 2017年8月14日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

微纳米结构Ti/Mg双连续相复合材料的可控制备及力学性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

激酶Pak2调控Caspase-3抑制细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于分布式MBD的高铁牵引供电系统故障预警研究

国家自然科学基金

0+阅读 · 2012年12月31日

代数曲线在序列中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

Dicer和Drosha基因遗传变异与膀胱癌易感性及其机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

MicroRNA在HBV感染中作用机理的研究

国家自然科学基金

0+阅读 · 2008年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

Arxiv

0+阅读 · 2023年7月6日

The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification

Arxiv

0+阅读 · 2023年7月5日

Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities

Arxiv

0+阅读 · 2023年7月4日

vWitness: Certifying Web Page Interactions with Computer Vision

Arxiv

0+阅读 · 2023年7月4日

Tales from the Git: Automating the detection of secrets on code and assessing developers' passwords choices

Arxiv

0+阅读 · 2023年7月3日

Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model

Arxiv

0+阅读 · 2023年7月1日

Self-Supervised Query Reformulation for Code Search

Arxiv

0+阅读 · 2023年7月1日

Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal

Arxiv

21+阅读 · 2022年12月12日

Adversarial Robustness of Representation Learning for Knowledge Graphs

Arxiv

10+阅读 · 2022年9月30日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

VIP会员

文章信息

相关主题

最新内容

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

1+阅读 · 今天15:03

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

0+阅读 · 今天14:31

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

0+阅读 · 今天14:29

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

12+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

4+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

7+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

6+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

8+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

11+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

12+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

8+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

21+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

10+阅读 · 6月17日

相关VIP内容

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

116+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

美国从乌克兰无人机战争中学习经验

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

专知

15+阅读 · 2018年6月15日

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

基于LSTM-CNN组合模型的Twitter情感分析（附代码）

机器学习研究会

50+阅读 · 2018年2月21日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

25+阅读 · 2017年8月14日

相关论文

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

Arxiv

0+阅读 · 2023年7月6日

The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification

Arxiv

0+阅读 · 2023年7月5日

Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities

Arxiv

0+阅读 · 2023年7月4日

vWitness: Certifying Web Page Interactions with Computer Vision

Arxiv

0+阅读 · 2023年7月4日

Tales from the Git: Automating the detection of secrets on code and assessing developers' passwords choices

Arxiv

0+阅读 · 2023年7月3日

Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model

Arxiv

0+阅读 · 2023年7月1日

Self-Supervised Query Reformulation for Code Search

Arxiv

0+阅读 · 2023年7月1日

Reasoning over Different Types of Knowledge Graphs: Static, Temporal and Multi-Modal

Arxiv

21+阅读 · 2022年12月12日

Adversarial Robustness of Representation Learning for Knowledge Graphs

Arxiv

10+阅读 · 2022年9月30日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

相关基金

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

微纳米结构Ti/Mg双连续相复合材料的可控制备及力学性能研究

国家自然科学基金

0+阅读 · 2013年12月31日

激酶Pak2调控Caspase-3抑制细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于分布式MBD的高铁牵引供电系统故障预警研究

国家自然科学基金

0+阅读 · 2012年12月31日

代数曲线在序列中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

Dicer和Drosha基因遗传变异与膀胱癌易感性及其机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于本体的Deep Web搜索技术

国家自然科学基金

2+阅读 · 2009年12月31日

MicroRNA在HBV感染中作用机理的研究

国家自然科学基金

0+阅读 · 2008年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员