Plot Retrieval as an Assessment of Abstract Semantic Association

Retrieving relevant plots from the book for a query is a critical task, which can improve the reading experience and efficiency of readers. Readers usually only give an abstract and vague description as the query based on their own understanding, summaries, or speculations of the plot, which requires the retrieval model to have a strong ability to estimate the abstract semantic associations between the query and candidate plots. However, existing information retrieval (IR) datasets cannot reflect this ability well. In this paper, we propose Plot Retrieval, a labeled dataset to train and evaluate the performance of IR models on the novel task Plot Retrieval. Text pairs in Plot Retrieval have less word overlap and more abstract semantic association, which can reflect the ability of the IR models to estimate the abstract semantic association, rather than just traditional lexical or semantic matching. Extensive experiments across various lexical retrieval, sparse retrieval, dense retrieval, and cross-encoder methods compared with human studies on Plot Retrieval show current IR models still struggle in capturing abstract semantic association between texts. Plot Retrieval can be the benchmark for further research on the semantic association modeling ability of IR models.

翻译：从书籍中检索与查询相关的情节是一项关键任务，能够提升读者的阅读体验与效率。读者通常基于自身理解、总结或对情节的推测，仅给出抽象模糊的描述作为查询，这就要求检索模型具备较强的能力，以评估查询与候选情节之间的抽象语义关联。然而，现有的信息检索数据集无法很好地反映这一能力。本文提出情节检索这一标注数据集，用于训练和评估信息检索模型在新任务“情节检索”上的性能。情节检索中的文本对词汇重叠度较低，且具有更强的抽象语义关联，这能反映信息检索模型评估抽象语义关联的能力，而非仅仅依赖传统的词汇或语义匹配。通过将各种词汇检索、稀疏检索、密集检索及交叉编码器方法，与基于情节检索的人为研究进行广泛对比实验，结果表明当前的检索模型在捕捉文本间的抽象语义关联方面仍存在困难。情节检索可作为进一步研究信息检索模型语义关联建模能力的基准数据集。

相关内容

关注 14

信息检索杂志（IR）为信息检索的广泛领域中的理论、算法分析和实验的发布提供了一个国际论坛。感兴趣的主题包括对应用程序（例如Web，社交和流媒体，推荐系统和文本档案）的搜索、索引、分析和评估。这包括对搜索中人为因素的研究、桥接人工智能和信息检索以及特定领域的搜索应用程序。官网地址：https://dblp.uni-trier.de/db/journals/ir/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日