Child sexual exploitation and abuse (CSEA) case data is inherently disturbing, fragmented across multiple organizations, jurisdictions, and agencies, with varying levels of detail and formatting, making cross-case analysis, pattern identification, and trend detection challenging. This paper presents CaseLinker, a modular system for ingesting, processing, analyzing, and visualizing CSEA case data. CaseLinker employs a hybrid deterministic information extraction approach combining regex-based extraction for structured data (demographics, platforms, evidence) with pattern-based semantic analysis for severity indicators and case topics, ensuring interpretability and auditability. The system extracts relevant case information, populates a comprehensive case schema, creates six interactive visualizations (Timeline, Severity Indicators, Case Visualization, Previous Perpetrator Status, Environment/Platforms, Organizations Involved), provides a platform for deeper automated and manual analysis, groups similar cases using weighted Jaccard similarity across multiple dimensions (platforms, demographics, topics, severity, investigation type), and provides automated triage and insights based on collected case data. CaseLinker is evaluated on 47 cases from publicly available AZICAC reports (2011-2014), demonstrating effective information extraction, case clustering, automated insights generation, and interactive visualization capabilities. CaseLinker addresses critical challenges in case analysis including fragmented data sources, cross-case pattern identification, and the emotional burden of repeatedly processing disturbing case material.
翻译:儿童性剥削与虐待(CSEA)案件数据具有固有的敏感性,分散于多个组织、司法管辖区和机构之间,且详细程度与格式各异,导致跨案例分析、模式识别和趋势检测极具挑战。本文介绍CaseLinker——一个用于接收、处理、分析及可视化CSEA案件数据的模块化系统。CaseLinker采用混合确定性信息抽取方法:对结构化数据(人口统计特征、平台、证据)使用基于正则表达式的抽取,对严重性指标和案件主题则采用基于模式的语义分析,确保系统可解释性与可审计性。该系统可抽取相关案件信息,填充综合案件模式,生成六类交互式可视化(时间线、严重性指标、案件可视化、前科施暴者状态、环境/平台、涉案组织),为深层自动化与人工分析提供平台,通过加权杰卡德相似度(涵盖平台、人口统计特征、主题、严重性、调查类型等多维度)对相似案件进行聚类,并基于收集的案件数据提供自动化分诊与洞察。我们在来自公开AZICAC报告(2011-2014年)的47起案件上对CaseLinker进行了评估,结果表明其具备高效的信息抽取、案件聚类、自动化洞察生成及交互式可视化能力。CaseLinker有效应对了案件分析中的关键挑战,包括分散的数据源、跨案例模式识别,以及反复处理干扰性案件材料所造成的情感负担。