Self-Supervised Contrastive BERT Fine-tuning for Fusion-based Reviewed-Item Retrieval

As natural language interfaces enable users to express increasingly complex natural language queries, there is a parallel explosion of user review content that can allow users to better find items such as restaurants, books, or movies that match these expressive queries. While Neural Information Retrieval (IR) methods have provided state-of-the-art results for matching queries to documents, they have not been extended to the task of Reviewed-Item Retrieval (RIR), where query-review scores must be aggregated (or fused) into item-level scores for ranking. In the absence of labeled RIR datasets, we extend Neural IR methodology to RIR by leveraging self-supervised methods for contrastive learning of BERT embeddings for both queries and reviews. Specifically, contrastive learning requires a choice of positive and negative samples, where the unique two-level structure of our item-review data combined with meta-data affords us a rich structure for the selection of these samples. For contrastive learning in a Late Fusion scenario, we investigate the use of positive review samples from the same item and/or with the same rating, selection of hard positive samples by choosing the least similar reviews from the same anchor item, and selection of hard negative samples by choosing the most similar reviews from different items. We also explore anchor sub-sampling and augmenting with meta-data. For a more end-to-end Early Fusion approach, we introduce contrastive item embedding learning to fuse reviews into single item embeddings. Experimental results show that Late Fusion contrastive learning for Neural RIR outperforms all other contrastive IR configurations, Neural IR, and sparse retrieval baselines, thus demonstrating the power of exploiting the two-level structure in Neural RIR approaches as well as the importance of preserving the nuance of individual review content via Late Fusion methods.

翻译：随着自然语言界面使用户能够表达日益复杂的自然语言查询，同时用户评论内容的爆炸式增长使得用户能够更好地找到与这些表达性查询匹配的项目（如餐厅、书籍或电影）。尽管神经信息检索（IR）方法在查询-文档匹配方面取得了最先进的结果，但尚未被推广到评论项检索（RIR）任务，其中查询-评论分数必须聚合（或融合）为项目级分数以进行排序。在缺乏标注RIR数据集的情况下，我们通过利用自监督方法对查询和评论进行BERT嵌入的对比学习，将神经信息检索方法扩展到RIR。具体而言，对比学习需要选择正负样本，而我们的项目-评论数据独特的双层结构与元数据相结合，为样本选择提供了丰富的结构。针对延迟融合场景下的对比学习，我们研究了使用来自同一项目和/或具有相同评分的正评论样本、通过选择来自同一锚定项目的最不相似评论来选取难正样本，以及通过选择来自不同项目的最相似评论来选取难负样本。我们还探索了锚定子采样及元数据增强。对于更端到端的早期融合方法，我们引入了对比项目嵌入学习，将评论融合为单一项目嵌入。实验结果表明，针对神经RIR的延迟融合对比学习优于所有其他对比IR配置、神经IR及稀疏检索基线，从而展示了在神经RIR方法中利用双层结构的潜力，以及通过延迟融合方法保留单个评论内容细微差别的重要性。