This project tackles the pressing issue of human trafficking in online C2C marketplaces through advanced Natural Language Processing (NLP) techniques. We introduce a novel methodology for generating pseudo-labeled datasets with minimal supervision, serving as a rich resource for training state-of-the-art NLP models. Focusing on tasks like Human Trafficking Risk Prediction (HTRP) and Organized Activity Detection (OAD), we employ cutting-edge Transformer models for analysis. A key contribution is the implementation of an interpretability framework using Integrated Gradients, providing explainable insights crucial for law enforcement. This work not only fills a critical gap in the literature but also offers a scalable, machine learning-driven approach to combat human exploitation online. It serves as a foundation for future research and practical applications, emphasizing the role of machine learning in addressing complex social issues.
翻译:本项目通过先进的自然语言处理(NLP)技术,致力于解决在线C2C市场中人口贩卖这一紧迫问题。我们提出了一种新颖的方法论,能够在极少监督条件下生成伪标记数据集,为训练最先进的NLP模型提供丰富资源。聚焦于人口贩卖风险预测(HTRP)和有组织活动检测(OAD)等任务,我们采用前沿的Transformer模型进行分析。关键贡献在于实现了基于积分梯度(Integrated Gradients)的可解释性框架,为执法部门提供可解释的洞察。这项工作不仅填补了文献中的关键空白,还提供了一种可扩展的、由机器学习驱动的在线打击人口剥削的方法。它作为未来研究和实践应用的基础,强调机器学习在解决复杂社会问题中的重要作用。