Similarity has been applied to a wide range of security applications, typically used in machine learning models. We examine the problem posed by masquerading samples; that is samples crafted by bad actors to be similar or near identical to legitimate samples. We find that these samples potentially create significant problems for machine learning solutions. The primary problem being that bad actors can circumvent machine learning solutions by using masquerading samples. We then examine the interplay between digital signatures and machine learning solutions. In particular, we focus on executable files and code signing. We offer a taxonomy for masquerading files. We use a combination of similarity and clustering to find masquerading files. We use the insights gathered in this process to offer improvements to similarity based and machine learning security solutions.
翻译:相似性已广泛应用于各类安全应用场景,通常被用于机器学习模型中。我们研究了伪装样本带来的问题,即恶意行为者精心构造与合法样本高度相似或近乎相同的样本。我们发现这类样本可能给机器学习解决方案带来重大问题,主要问题在于恶意行为者可通过伪装样本规避机器学习防御机制。进而探讨了数字签名与机器学习解决方案之间的相互作用,重点关注可执行文件与代码签名领域。我们提出了伪装文件的分类体系,运用相似性分析与聚类方法定位伪装文件,并基于该过程获得的洞见,为基于相似性的安全解决方案及机器学习安全方案提出改进建议。