The presence of a large number of bots in Online Social Networks (OSN) leads to undesirable social effects. Graph neural networks (GNNs) have achieved state-of-the-art performance in bot detection since they can effectively utilize user interaction. In most scenarios, the distribution of bots and humans is imbalanced, resulting in under-represent minority class samples and sub-optimal performance. However, previous GNN-based methods for bot detection seldom consider the impact of class-imbalanced issues. In this paper, we propose an over-sampling strategy for GNN (OS-GNN) that can mitigate the effect of class imbalance in bot detection. Compared with previous over-sampling methods for GNNs, OS-GNN does not call for edge synthesis, eliminating the noise inevitably introduced during the edge construction. Specifically, node features are first mapped to a feature space through neighborhood aggregation and then generated samples for the minority class in the feature space. Finally, the augmented features are fed into GNNs to train the classifiers. This framework is general and can be easily extended into different GNN architectures. The proposed framework is evaluated using three real-world bot detection benchmark datasets, and it consistently exhibits superiority over the baselines.
翻译:在线社交网络(OSN)中大量机器人的存在会导致不良的社会影响。图神经网络(GNN)因其能有效利用用户交互而在机器人检测中取得了最先进的性能。在大多数场景下,机器人与人类的分布存在不平衡,导致少数类样本代表性不足和次优性能。然而,先前基于GNN的机器人检测方法很少考虑类不平衡问题的影响。本文提出了一种针对GNN的过采样策略(OS-GNN),可缓解机器人检测中类不平衡的影响。与先前针对GNN的过采样方法相比,OS-GNN无需进行边合成,从而消除了边构建过程中不可避免引入的噪声。具体而言,节点特征首先通过邻域聚合映射到特征空间,然后在特征空间中为少数类生成样本。最后,增强后的特征被输入GNN以训练分类器。该框架具有通用性,可轻松扩展到不同的GNN架构。使用三个真实世界的机器人检测基准数据集对所提出的框架进行评估,结果显示其 consistently 优于基线方法。