Dense retrieval has become the new paradigm in passage retrieval. Despite its effectiveness on typo-free queries, it is not robust when dealing with queries that contain typos. Current works on improving the typo-robustness of dense retrievers combine (i) data augmentation to obtain the typoed queries during training time with (ii) additional robustifying subtasks that aim to align the original, typo-free queries with their typoed variants. Even though multiple typoed variants are available as positive samples per query, some methods assume a single positive sample and a set of negative ones per anchor and tackle the robustifying subtask with contrastive learning; therefore, making insufficient use of the multiple positives (typoed queries). In contrast, in this work, we argue that all available positives can be used at the same time and employ contrastive learning that supports multiple positives (multi-positive). Experimental results on two datasets show that our proposed approach of leveraging all positives simultaneously and employing multi-positive contrastive learning on the robustifying subtask yields improvements in robustness against using contrastive learning with a single positive.
翻译:密集检索已成为段落检索领域的新范式。尽管该方法在无拼写错误查询中表现优异,但在处理含拼写错误的查询时鲁棒性不足。当前提升密集检索器拼写鲁棒性的研究主要结合两种策略:(i)通过数据增强在训练阶段获取含拼写错误的查询;(ii)引入额外鲁棒性增强子任务,旨在对齐原始无拼写错误查询与其拼写错误变体。尽管每个查询可生成多个拼写错误变体作为正样本,但现有方法通常假设每个锚点仅有一个正样本和一组负样本,并通过对比学习实现鲁棒性增强子任务,因而未能充分利用多个正样本(拼写错误查询)。与此不同,本文主张同时利用所有可用正样本,并采用支持多正例的对比学习方法(多正例对比学习)。在两个数据集上的实验结果表明,本方法通过同步利用所有正样本并在鲁棒性增强子任务中应用多正例对比学习,相较于采用单正例对比学习的方法,在鲁棒性提升方面取得了显著改进。