In this contribution, we deal with seed-based information retrieval in networks of research publications. Using systematic reviews as a baseline, and publication data from the NIH Open Citation Collection, we compare the performance of the three citation-based approaches direct citation, co-citation, and bibliographic coupling with respect to recall and precision measures. In addition, we include the PubMed Related Article score as well as combined approaches in the comparison. We also provide a fairly comprehensive review of earlier research in which citation relations have been used for information retrieval purposes. The results show an advantage for co-citation over bibliographic coupling and direct citation. However, combining the three approaches outperforms the exclusive use of co-citation in the study. The results further indicate, in line with previous research, that combining citation-based approaches with textual approaches enhances the performance of seed-based information retrieval. The results from the study may guide approaches combining citation-based and textual approaches in their choice of citation similarity measures. We suggest that future research use more structured approaches to evaluate methods for seed-based retrieval of publications, including comparative approaches as well as the elaboration of common data sets and baselines for evaluation.
翻译:本研究探讨了基于种子的研究文献网络信息检索方法。以系统综述为基线,利用美国国立卫生研究院开放引文数据集中的出版物数据,我们比较了直接引用、共被引和文献耦合三种基于引文的方法在召回率和精确度方面的表现。此外,我们将PubMed相关文章评分及组合方法纳入比较,并对此前利用引用关系进行信息检索的研究进行了较为全面的综述。结果表明,共被引方法优于文献耦合和直接引用,但三种方法的组合表现优于单独使用共被引。与先前研究一致,结果还表明将基于引文的方法与文本方法相结合可提升基于种子的信息检索性能。本研究结果可为基于引文与文本的组合方法在选择引文相似性度量时提供指导。我们建议未来研究采用更结构化的方法评估基于种子的出版物检索方法,包括比较方法以及共同数据集和评估基线的构建。