A science map of topics is a visualization that shows topics identified algorithmically based on the bibliographic metadata of scientific publications. In practice not all topics are well represented in a science map. We analyzed how effectively different topics are represented in science maps created by clustering biomedical publications. To achieve this, we investigated which topic categories, obtained from MeSH terms, are better represented in science maps based on citation or text similarity networks. To evaluate the clustering effectiveness of topics, we determined the extent to which documents belonging to the same topic are grouped together in the same cluster. We found that the best and worst represented topic categories are the same for citation and text similarity networks. The best represented topic categories are diseases, psychology, anatomy, organisms and the techniques and equipment used for diagnostics and therapy, while the worst represented topic categories are natural science fields, geographical entities, information sciences and health care and occupations. Furthermore, for the diseases and organisms topic categories and for science maps with smaller clusters, we found that topics tend to be better represented in citation similarity networks than in text similarity networks.
翻译:科学图谱是一种基于科学出版物元数据算法识别主题的可视化呈现。实践中,并非所有主题都能在科学图谱中得到良好表征。本研究通过聚类生物医学文献构建的科学图谱,分析了不同主题的表征有效性。为此,我们探究了从MeSH术语中提取的哪些主题类别,在基于引文或文本相似性网络的科学图谱中具有更好的表征效果。为评估主题的聚类有效性,我们测定了属于同一主题的文献在相同聚类中的聚集程度。研究发现:在引文与文本相似性网络中,表征效果最佳和最差的主题类别完全一致。表征最佳的主题类别包括疾病、心理学、解剖学、生物体以及诊断治疗相关的技术与设备;而表征最差的主题类别涵盖自然科学领域、地理实体、信息科学以及医疗保健与职业领域。此外,针对疾病和生物体这两类主题,以及在聚类规模较小的科学图谱中,引文相似性网络往往比文本相似性网络能更有效地表征主题。