The field of scientometrics has shown the power of citation-based clusters for literature analysis, yet this technique has barely been used for information retrieval tasks. This work evaluates the performance of citation based-clusters for information retrieval tasks. We simulated a search process using these clusters with a tree hierarchy of clusters and a cluster selection algorithm. We evaluated the task of finding the relevant documents for 25 systematic reviews. Our evaluation considered several trade-offs between recall and precision for the cluster selection, and we also replicated the Boolean queries self-reported by the systematic review to serve as a reference. We found that citation-based clusters search performance is highly variable and unpredictable, that it works best for users that prefer recall over precision at a ratio between 2 and 8, and that when used along with query-based search they complement each other, including finding new relevant documents.
翻译:科学计量学领域已展示了基于引文的聚类在文献分析中的强大能力,然而这一技术几乎未被应用于信息检索任务。本研究评估了基于引文的聚类在信息检索任务中的性能。我们利用具有树状层级结构的聚类及聚类选择算法模拟了检索过程,针对25篇系统综述评估了相关文献的查找任务。我们的评估考虑了聚类选择中召回率与精确率之间的多种权衡,并复现了系统综述自行报告的布尔查询作为参考基准。研究发现:基于引文的聚类检索性能具有高度波动性与不可预测性;当用户倾向于以2至8倍的比率优先考虑召回率而非精确率时效果最佳;与基于查询的检索配合使用时,两者能形成互补,包括发现新的相关文献。