Illicit financial activities such as money laundering often manifest through recurrent topological patterns in transaction networks. Detecting these patterns automatically remains challenging due to the scarcity of labeled real-world data and strict privacy constraints. To address this, we investigate whether Graph Autoencoders (GAEs) can effectively learn and distinguish topological patterns that mimic money laundering operations when trained on synthetic data. The analysis consists of two phases: (i) data generation, where synthetic samples are created for seven well-known illicit activity patterns using parametrized generators that preserve structural consistency while introducing realistic variability; and (ii) model training and validation, where separate GAEs are trained on each pattern without explicit labels, relying solely on reconstruction error as an indicator of learned structure. We compare three GAE implementations based on three distinct convolutional layers: Graph Convolutional (GAE-GCN), GraphSAGE (GAE-SAGE), and Graph Attention Network (GAE-GAT). Experimental results show that GAE-GCN achieves the most consistent reconstruction performance across patterns, while GAE-SAGE and GAE-GAT exhibit competitive results only in few specific patterns. These findings suggest that graph-based representation learning on synthetic data provides a viable path toward developing AI-driven tools for detecting illicit behaviors, overcoming the limitations of financial datasets.
翻译:洗钱等非法金融活动通常通过交易网络中反复出现的拓扑模式显现。由于现实世界标注数据的稀缺性以及严格的隐私限制,自动检测这些模式仍然具有挑战性。为此,我们研究了图自编码器(GAE)在合成数据上训练时,能否有效学习并区分模拟洗钱操作的拓扑模式。分析包含两个阶段:(i)数据生成阶段,使用参数化生成器为七种已知的非法活动模式创建合成样本,这些生成器在保持结构一致性的同时引入了真实的变异性;(ii)模型训练与验证阶段,针对每种模式分别训练独立的GAE模型,无需显式标签,仅依靠重构误差作为学习结构的指标。我们比较了基于三种不同卷积层的GAE实现:图卷积(GAE-GCN)、GraphSAGE(GAE-SAGE)和图注意力网络(GAE-GAT)。实验结果表明,GAE-GCN在所有模式中实现了最一致的重构性能,而GAE-SAGE和GAE-GAT仅在少数特定模式中表现出有竞争力的结果。这些发现表明,基于合成数据的图表示学习为开发用于检测非法行为的人工智能驱动工具提供了一条可行路径,从而克服了金融数据集的局限性。