Deep learning classifiers achieve state-of-the-art performance in various risk detection applications. They explore rich semantic representations and are supposed to automatically discover risk behaviors. However, due to the lack of transparency, the behavioral semantics cannot be conveyed to downstream security experts to reduce their heavy workload in security analysis. Although feature attribution (FA) methods can be used to explain deep learning, the underlying classifier is still blind to what behavior is suspicious, and the generated explanation cannot adapt to downstream tasks, incurring poor explanation fidelity and intelligibility. In this paper, we propose FINER, the first framework for risk detection classifiers to generate high-fidelity and high-intelligibility explanations. The high-level idea is to gather explanation efforts from model developer, FA designer, and security experts. To improve fidelity, we fine-tune the classifier with an explanation-guided multi-task learning strategy. To improve intelligibility, we engage task knowledge to adjust and ensemble FA methods. Extensive evaluations show that FINER improves explanation quality for risk detection. Moreover, we demonstrate that FINER outperforms a state-of-the-art tool in facilitating malware analysis.
翻译:深度学习分类器在各种风险检测应用中实现了最先进的性能。它们探索丰富的语义表示,并应能自动发现风险行为。然而,由于缺乏透明性,行为语义无法传递给下游安全专家以减轻其在安全分析中的繁重负担。尽管特征归因(FA)方法可用于解释深度学习,但底层分类器仍对何种行为可疑一无所知,生成的解释也无法适应下游任务,导致解释保真度和可理解性不佳。本文提出FINER,这是首个面向风险检测分类器生成高保真度与高可理解性解释的框架。其核心理念是汇聚模型开发人员、特征归因设计人员及安全专家的解释工作。为提升保真度,我们采用解释引导的多任务学习策略微调分类器;为增强可理解性,我们引入任务知识来调整和集成特征归因方法。广泛评估表明,FINER能提升风险检测的解释质量。此外,我们证明FINER在促进恶意软件分析方面优于现有最先进工具。