Dense retrieval is a basic building block of information retrieval applications. One of the main challenges of dense retrieval in real-world settings is the handling of queries containing misspelled words. A popular approach for handling misspelled queries is minimizing the representations discrepancy between misspelled queries and their pristine ones. Unlike the existing approaches, which only focus on the alignment between misspelled and pristine queries, our method also improves the contrast between each misspelled query and its surrounding queries. To assess the effectiveness of our proposed method, we compare it against the existing competitors using two benchmark datasets and two base encoders. Our method outperforms the competitors in all cases with misspelled queries. Our code and models are available at https://github. com/panuthept/DST-DenseRetrieval.
翻译:密集检索是信息检索应用的基本构建模块。在实际场景中,密集检索面临的主要挑战之一是如何处理包含拼写错误的查询。处理拼写错误查询的一种常见方法是,最小化拼写错误查询与其原始查询之间的表示差异。与现有方法仅关注拼写错误查询与原始查询之间的对齐不同,我们的方法还增强了每个拼写错误查询与其周围查询之间的对比度。为评估所提方法的有效性,我们使用两个基准数据集和两个基础编码器,将其与现有竞争方法进行比较。在所有拼写错误查询的场景中,我们的方法均优于竞争方法。我们的代码和模型可在 https://github.com/panuthept/DST-DenseRetrieval 获取。