In a number of information retrieval applications (e.g., patent search, literature review, due diligence, etc.), preventing false negatives is more important than preventing false positives. However, approaches designed to reduce review effort (like "technology assisted review") can create false negatives, since they are often based on active learning systems that exclude documents automatically based on user feedback. Therefore, this research proposes a more recall-oriented approach to reducing review effort. More specifically, through iteratively re-ranking the relevance rankings based on user feedback, which is also referred to as relevance feedback. In our proposed method, the relevance rankings are produced by a BERT-based dense-vector search and the relevance feedback is based on cumulatively summing the queried and selected embeddings. Our results show that this method can reduce review effort between 17.85% and 59.04%, compared to a baseline approach (of no feedback), given a fixed recall target
翻译:在许多信息检索应用(如专利搜索、文献综述、尽职调查等)中,防止假阴性比防止假阳性更为重要。然而,旨在减少审查工作量的方法(例如“技术辅助审查”)可能会产生假阴性,因为这些方法通常基于主动学习系统,该系统会根据用户反馈自动排除文档。因此,本研究提出了一种更注重召回率的方法来减少审查工作量。具体而言,通过基于用户反馈迭代地对相关性排名进行重新排序——这也被称为相关性反馈。在我们提出的方法中,相关性排名由基于BERT的稠密向量搜索生成,而相关性反馈则基于对查询向量和已选择嵌入向量的累积求和。结果表明,在固定召回目标的情况下,与基线方法(无反馈)相比,该方法可将审查工作量减少17.85%至59.04%。