Online scam behavior is inherently multi-stage, and the lifecycle includes temporally ordered rails and events rather than isolated signals. Existing works analyze characteristics of scam types and rails, but they do not track scam trends across years. Moreover, the work on the relations between rails is hampered due to the lack of open-source datasets with annotations and coverage of different scam types. To address these gaps, we build a dataset to analyze the yearly trend of scam characteristics and rail paths using Reddit self-disclosure narratives from 2023 to 2025. We collect 21,304 posts from scam-related subreddits with at least one rail among identity, communication, platform, and payment for trend analysis by heuristic annotation. Then, we label 1,800 posts containing explicit or recoverable scam chains by an LLM-assisted method for scam path analysis. The method is evaluated with human annotation. Lastly, we run a topic model on the comments of the posts to analyze the community support behavior. The results reveal that scam processes are predominantly multi-rail. Across years, different scam types and rail components dominate. Different scam types vary systematically in path complexity. Reddit support behaviors have become more detailed over time. This work supports synthetic scam chain data simulation and AI-related scam risk assessment, though findings may not generalise to other platforms.
翻译:在线诈骗行为本质上具有多阶段特征,其生命周期包含时序关联的路径与事件,而非孤立信号。现有研究虽分析了诈骗类型与路径特征,但未能跟踪跨年度诈骗趋势。此外,由于缺乏涵盖不同诈骗类型且带有标注的开源数据集,对路径间关系的研究也受到阻碍。为弥补这些不足,我们构建了一个数据集,利用2023至2025年间Reddit平台的自述叙事分析诈骗特征与路径的年度趋势。我们通过启发式标注从诈骗相关子论坛收集了21,304篇帖子,这些帖子至少包含身份、通信、平台或支付中的一条路径,用于趋势分析。随后,我们采用大语言模型辅助方法对1,800篇包含显式或可恢复诈骗链的帖子进行标注,用于诈骗路径分析,并借助人工标注评估该方法。最后,我们对帖子评论进行主题建模以分析社区支持行为。结果表明,诈骗过程以多路径为主导特征;不同年份中,主导的诈骗类型与路径组件存在差异;各类诈骗在路径复杂度上呈现系统性差异;Reddit社区的支持行为随时间推移更趋精细化。本研究可为合成诈骗链数据模拟及与人工智能相关的诈骗风险评估提供支持,但研究结论可能无法推广至其他平台。