Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain, described encyclopedically in textual knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP mapping, is an important and challenging task. Conventional learning approaches often target the problem in the classical multi-class or multilabel classification setting. This setting hinders the learning ability of the model due to a large number of classes (i.e., TTPs), the inevitable skewness of the label distribution and the complex hierarchical structure of the label space. We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two, thus reducing the complexity of competing solely over the large labeling space. To that end, we propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism, facilitating the learning process of the matching model despite constrained resources.
翻译:战术、技术与程序(TTPs)代表了网络安全领域中复杂的攻击模式,并以百科全书式的方式记录在文本知识库中。识别网络安全文献中的TTP(通常称为TTP映射)是一项重要且具有挑战性的任务。传统学习方法通常将这一问题视为经典的多类或多标签分类任务。由于类别(即TTP)数量庞大、标签分布不可避免的偏斜以及标签空间复杂的层次结构,这种设置限制了模型的学习能力。我们采用不同的学习范式来表述该问题,其中文本与TTP标签的分配由两者之间的直接语义相似性决定,从而降低了在庞大标签空间内进行单一竞争的复杂性。为此,我们提出了一种神经匹配架构,并引入基于有效采样的"学习比较"机制,在资源受限的条件下促进了匹配模型的学习过程。