SHERLOCK:Towards Dynamic Knowledge Adaptation in LLM-enhanced E-commerce Risk Management

Effective e-commerce risk management requires in-depth case investigations to identify emerging fraud patterns in highly adversarial environments. However, manual investigation typically requires analyzing the associations and couplings among multi-source heterogeneous data, a labor-intensive process that limits efficiency. While Large Language Models (LLMs) show promise in automating these analyses, their deployment is hindered by the complexity of risk scenarios and the sparsity of long-tail domain knowledge. To address these challenges, we propose Sherlock, a framework that integrates structured domain knowledge with LLM-based reasoning through three core modules. First, we construct a domain Knowledge Base (KB) by distilling structured expertise from heterogeneous knowledge sources. Second, we design a two-stage retrieval-augmented generation strategy tailored for case investigation, which combines input contextual augmentation with a Reflect & Refine module to fully leverage the KB for improved analysis quality. Finally, we develop an integrated platform for operations and annotation to drive a self-evolving data flywheel. By combining real-time hotfixes through KB updates with periodic logic alignment via post-training, we facilitate continuous system evolution to counteract adversarial drifts. Online A/B tests at JD dot com demonstrate that Sherlock achieves an 82% Expert Acceptance Rate (EAR) and a 386.7% increase in daily investigation throughput. An additional 90-day evaluation shows that the flywheel successfully recovers from performance decay caused by changing tactics twice, raising the EAR ceiling by around 3.5% through autonomous model updates.

翻译：有效的电商风险管理需要进行深入的案件调查，以识别高度对抗环境中新出现的欺诈模式。然而，人工调查通常需要分析多源异构数据之间的关联与耦合，这是一个劳动密集型过程，限制了效率。尽管大型语言模型（LLMs）在自动化这些分析方面展现出潜力，但其部署受到风险场景的复杂性和长尾领域知识稀疏性的阻碍。为应对这些挑战，我们提出了Sherlock框架，该框架通过三个核心模块将结构化领域知识与基于LLM的推理相结合。首先，我们通过从异构知识源中提炼结构化专业知识，构建了一个领域知识库（KB）。其次，我们设计了一个专为案件调查定制的两阶段检索增强生成策略，该策略将输入上下文增强与“反思与精炼”模块相结合，以充分利用知识库来提升分析质量。最后，我们开发了一个集成的运营与标注平台，以驱动一个自我演进的数据飞轮。通过结合基于知识库更新的实时热修复与通过后训练的定期逻辑对齐，我们促进了系统的持续演进以对抗对抗性漂移。在京东进行的在线A/B测试表明，Sherlock实现了82%的专家采纳率（EAR），并将每日调查吞吐量提升了386.7%。一项额外的90天评估显示，该飞轮成功地从两次由策略变化导致的性能衰减中恢复，并通过自主模型更新将EAR上限提升了约3.5%。