SynthChain: A Synthetic Benchmark and Forensic Analysis of Advanced and Stealthy Software Supply Chain Attacks

Advanced software supply chain (SSC) attacks are increasingly runtime-only and leave fragmented evidence across hosts, services, and build/dependency layers, so any single telemetry stream is inherently insufficient to reconstruct full compromise chains under realistic access and budget limits. We present SynthChain, a near-production testbed and a multi-source runtime dataset with chain-level ground truth, derived from real-world malicious packages and exploit campaigns. SynthChain covers seven representative supply-chain exploit scenarios across PyPI, npm, and a native C/C++ supply-chain case, spanning Windows and Linux, and involving four hosts and one containerized environment. Scenarios span realistic time windows from minutes to hours and are annotated with 14 MITRE ATT&CK tactics and 161 techniques (29-104 techniques per scenario). Beyond releasing the data, we quantify observability constraints by mapping each chain step to the minimum evidence needed for detection and cross-source correlation. With realistic trace availability, no single source is chain-complete: the best single source reaches only 0.391 weighted tag/step coverage and 0.403 mean chain reconstruction. Even minimal two-source fusion boosts coverage to 0.636 and reconstruction to 0.639 (approximately 1.6x gain), with consistent chain coverage/recall improvements (0.545). The corpus contains approximately 0.58M raw multi-source events and 1.50M evaluation rows, enabling controlled studies of detection under constrained telemetry. We release the dataset, ground truth, and artifacts to support reproducible, forensic-aware runtime defenses and to guide efficient detection for software supply chains.

翻译：高级软件供应链攻击日益呈现仅运行时发作的特征，其碎片化证据散布于主机、服务及构建/依赖层之间，因此在现实访问权限与预算限制下，任何单一遥测流本质上都无法重构完整的攻击链。本文提出SynthChain——一个近生产环境的测试平台及具备链级真实标签的多源运行时数据集，其数据源自真实恶意软件包与漏洞利用活动。SynthChain涵盖PyPI、npm及原生C/C++供应链中的七类代表性攻击场景，横跨Windows与Linux系统，涉及四台主机及一个容器化环境。攻击场景时间窗口从数分钟至数小时不等，并标注了14项MITRE ATT&CK战术及161种技术（每场景29-104种技术）。除数据发布外，我们通过将攻击链各步骤映射至检测与跨源关联所需的最小证据集，量化了可观测性约束。在现实痕迹可用性条件下，单一数据源均无法完整覆盖攻击链：最佳单源仅能达到0.391的加权标签/步骤覆盖率及0.403的平均链重构率。即使进行最小化的双源融合，覆盖率可提升至0.636，重构率提升至0.639（约1.6倍增益），同时保持稳定的攻击链覆盖/召回率提升（0.545）。该语料库包含约58万条原始多源事件及150万条评估数据行，支持在受限遥测条件下开展受控检测研究。我们发布数据集、真实标签及实验构件，以支持可复现的取证感知运行时防御研究，并为软件供应链的高效检测提供指导。