Linear combination is a potent data fusion method in information retrieval tasks, thanks to its ability to adjust weights for diverse scenarios. However, achieving optimal weight training has traditionally required manual relevance judgments on a large percentage of documents, a labor-intensive and expensive process. In this study, we investigate the feasibility of obtaining near-optimal weights using a mere 20\%-50\% of relevant documents. Through experiments on four TREC datasets, we find that weights trained with multiple linear regression using this reduced set closely rival those obtained with TREC's official "qrels." Our findings unlock the potential for more efficient and affordable data fusion, empowering researchers and practitioners to reap its full benefits with significantly less effort.
翻译:线性组合是信息检索任务中一种高效的数据融合方法,因其能够根据不同场景调整权重而备受青睐。然而,传统上实现最优权重训练需要对大量文档进行人工相关性判断,这一过程既耗时又昂贵。本研究探讨了仅使用20%-50%的相关文档即可获得接近最优权重的可行性。通过对四个TREC数据集进行实验,我们发现使用这种简化集通过多元线性回归训练的权重与使用TREC官方“qrels”获得的权重高度接近。这一发现为更高效、更经济的数据融合释放了潜力,使研究人员和实践者能够以显著更少的努力充分利用其优势。