High-Precision APT Malware Attribution with Out-of-Scope Resilience

Early attribution of Advanced Persistent Threat (APT) activity can help defenders prioritise investigation, select countermeasures, and reduce the impact of an intrusion. Malware provides useful attribution evidence, but automated APT malware attribution remains difficult in practice. Existing approaches are typically trained and evaluated as closed-set classifiers over a limited number of known APT groups. In operational environments, however, classifiers are likely to encounter samples from groups not represented during training. Closed-set classifiers are then forced to assign such samples to known groups, producing unsupported and potentially misleading attributions. We present a high-precision APT malware attribution method based on ranked binary classifiers with explicit abstention. Rather than training a single multi-class classifier, our approach trains and tunes two binary classifiers per APT group, ranks the classifiers by validation performance, and applies them sequentially. A sample is attributed only when a classifier provides sufficient evidence; otherwise, it abstains. We evaluate the method on the APT Malware dataset and on a larger combined dataset designed to stress-test out-of-scope behaviour. On the APT Malware dataset, the method achieves higher precision than previously published results on the same dataset. In the most challenging setting, where 87% of test samples came from 60 APT groups excluded from training, the method abstained on 94% of out-of-scope samples while maintaining 92% precision and 95% selective accuracy on the samples it classified.

翻译：高持续性威胁（APT）活动的早期归因可以帮助防御者优先调查、选择对策并降低入侵影响。恶意软件虽能提供有效归因证据，但自动化APT恶意软件归因在实践中仍面临困难。现有方法通常作为闭集分类器在有限数量的已知APT组织上进行训练和评估。然而在运行环境中，分类器很可能遇到训练阶段未包含的组织样本。闭集分类器被迫将这些样本归类至已知组织，导致产生缺乏依据且可能具有误导性的归因结果。本文提出一种基于排序二元分类器（含显式弃权机制）的高精度APT恶意软件归因方法。该方法并非训练单一多类分类器，而是为每个APT组织训练并调优两个二元分类器，依据验证性能对分类器进行排序后顺序应用。仅当分类器提供充分证据时才对样本进行归因，否则予以弃权。我们在APT恶意软件数据集和专门用于压力测试样本外行为的大型组合数据集上评估了该方法。在APT恶意软件数据集上，该方法比同一数据集的现有发表结果实现了更高精度。在最具挑战性的场景中（87%的测试样本来自训练中未包含的60个APT组织），该方法对94%的样本外样本弃权，同时对已分类样本保持92%精度和95%选择性准确率。