This is the first year of the TREC Product search track. The focus this year was the creation of a reusable collection and evaluation of the impact of the use of metadata and multi-modal data on retrieval accuracy. This year we leverage the new product search corpus, which includes contextual metadata. Our analysis shows that in the product search domain, traditional retrieval systems are highly effective and commonly outperform general-purpose pretrained embedding models. Our analysis also evaluates the impact of using simplified and metadata-enhanced collections, finding no clear trend in the impact of the expanded collection. We also see some surprising outcomes; despite their widespread adoption and competitive performance on other tasks, we find single-stage dense retrieval runs can commonly be noncompetitive or generate low-quality results both in the zero-shot and fine-tuned domain.
翻译:这是TREC产品搜索赛道的首年运行。本年度聚焦于创建可复用数据集,并评估元数据及多模态数据对检索准确率的影响。我们采用新增含上下文元数据的产品搜索语料库。分析表明,在产品搜索领域,传统检索系统具有显著有效性,且普遍优于通用预训练嵌入模型。同时通过对比简化版与元数据增强版数据集的检索效果,发现扩展数据集并未带来明确趋势性影响。值得注意的是,尽管单阶段稠密检索在零样本和微调场景中被广泛采用并在其他任务中表现优异,但其在本领域常出现竞争力不足或生成低质量结果的情况。