Traditional Machine Learning (ML) methods require large amounts of data to perform well, limiting their applicability in sparse or incomplete scenarios and forcing the usage of additional synthetic data to improve the model training. To overcome this challenge, the research community is looking more and more at Graph Machine Learning (GML) as it offers a powerful alternative by using relationships within data. However, this method also faces limitations, particularly when dealing with Knowledge Graphs (KGs), which can hide huge information due to their semantic nature. This study introduces Bi-View, a novel hybrid approach that increases the informative content of node features in KGs to generate enhanced Graph Embeddings (GEs) that are used to improve GML models without relying on additional synthetic data. The proposed work combines two complementary GE techniques: Node2Vec, which captures structural patterns through unsupervised random walks, and GraphSAGE, which aggregates neighbourhood information in a supervised way. Node2Vec embeddings are first computed to represent the graph topology, and node features are then enriched with centrality-based metrics, which are used as input for the GraphSAGE model. Moreover, a fusion layer combines the original Node2Vec embeddings with the GraphSAGE-influenced representations, resulting in a dual-perspective embedding space. Such a fusion captures both topological and semantic properties of the graph, enabling the model to exploit informative features that may exist in the dataset but that are not explicitly represented. Our approach improves downstream task performance, especially in scenarios with poor initial features, giving the basis for more accurate and precise KG-enanched GML models.
翻译:传统机器学习方法需要大量数据才能取得良好性能,这限制了其在稀疏或不完整场景下的适用性,并迫使使用额外的合成数据来改进模型训练。为克服这一挑战,研究界越来越多地关注图机器学习,因其通过利用数据内部关系提供了强大的替代方案。然而,该方法在处理知识图谱时仍存在局限,尤其是知识图谱因其语义特性可能隐藏大量信息。本研究提出Bi-View这一新颖混合方法,通过增强知识图谱中节点特征的信息含量来生成改进的图嵌入,从而在不依赖额外合成数据的情况下提升图机器学习模型性能。该工作融合了两种互补的图嵌入技术:通过无监督随机游走捕获结构模式的Node2Vec,以及以监督方式聚合邻域信息的GraphSAGE。首先计算Node2Vec嵌入以表示图拓扑结构,随后利用基于中心性的度量指标增强节点特征,并将其作为GraphSAGE模型的输入。此外,通过融合层将原始Node2Vec嵌入与受GraphSAGE影响的表征相结合,形成双视角嵌入空间。这种融合同时捕获了图的拓扑特性与语义属性,使模型能够挖掘数据集中潜在但未显式表征的信息特征。我们的方法提升了下游任务性能,尤其在初始特征匮乏的场景中,为构建更精准的知识图谱增强型图机器学习模型奠定了基础。