Effective crime linkage analysis is crucial for identifying serial offenders and enhancing public safety. To address limitations of traditional crime linkage methods in handling high-dimensional, sparse, and heterogeneous data, we propose a Siamese Autoencoder framework that learns meaningful latent representations and uncovers correlations in complex crime data. Using data from the Violent Crime Linkage Analysis System (ViCLAS), maintained by the Serious Crime Analysis Section of the UK's National Crime Agency, our approach mitigates signal dilution in sparse feature spaces by integrating geographic-temporal features at the decoder stage. This design amplifies behavioral representations rather than allowing them to be overshadowed at the input level, yielding consistent improvements across multiple evaluation metrics. We further analyze how different domain-informed data reduction strategies influence model performance, providing practical guidance for preprocessing in crime linkage contexts. Our results show that advanced machine learning approaches can substantially enhance linkage accuracy, improving AUC by up to 9% over traditional methods while offering interpretable insights to support investigative decision-making.
翻译:有效的犯罪关联分析对于识别连环犯罪者和提升公共安全至关重要。为克服传统犯罪关联方法在处理高维、稀疏及异构数据方面的局限性,本文提出一种孪生自编码器框架,该框架能够学习有意义的潜在表征并揭示复杂犯罪数据中的关联规律。基于英国国家犯罪局重大犯罪分析部门维护的暴力犯罪关联分析系统(ViCLAS)数据,我们的方法通过在解码器阶段整合地理时空特征,缓解了稀疏特征空间中的信号稀释问题。该设计强化了行为表征而非使其在输入层被掩盖,从而在多项评估指标上实现了持续改进。我们进一步分析了不同领域知识驱动的数据降维策略如何影响模型性能,为犯罪关联场景下的数据预处理提供了实用指导。实验结果表明,先进的机器学习方法能显著提升关联准确率,较传统方法将AUC指标提升高达9%,同时提供可解释的洞见以支持侦查决策。