Embeddings are powerful tools for transforming complex and unstructured data into numeric formats suitable for computational analysis tasks. In this work, we use multiple embeddings for similarity calculations to be applied in bibliometrics and scientometrics. We build a multivariate network (MVN) from a large set of scientific publications and explore an aspect-driven analysis approach to reveal similarity patterns in the given publication data. By dividing our MVN into separately embeddable aspects, we are able to obtain a flexible vector representation which we use as input to a novel method of similarity-based clustering. Based on these preprocessing steps, we developed a visual analytics application, called Simbanex, that has been designed for the interactive visual exploration of similarity patterns within the underlying publications.
翻译:嵌入是将复杂非结构化数据转化为适合计算分析任务的数值格式的强大工具。在本研究中,我们采用多种嵌入进行相似性计算,并将其应用于文献计量学和科学计量学。我们从大量科学出版物构建多元网络,并探索一种面向维度的分析方法,以揭示给定出版物数据中的相似性模式。通过将多元网络划分为可独立嵌入的维度,我们获得了灵活的向量表示,并将其作为基于相似性的新型聚类方法的输入。基于这些预处理步骤,我们开发了一个名为Simbanex的可视化分析应用,专为交互式可视化探索底层出版物中的相似性模式而设计。