Nowadays, the open-source software (OSS) ecosystem suffers from security threats of software supply chain (SSC) attacks. Interpreted OSS malware plays a vital role in SSC attacks, as criminals have an arsenal of attack vectors to deceive users into installing malware and executing malicious activities. In this paper, we introduce tactics, techniques, and procedures (TTPs) proposed by MITRE ATT\&CK into the interpreted malware analysis to characterize different phases of an attack lifecycle. Specifically, we propose GENTTP, a zero-shot approach to extracting a TTP of an interpreted malware package. GENTTP leverages large language models (LLMs) to automatically generate a TTP, where the input is a malicious package, and the output is a deceptive tactic and an execution tactic of attack vectors. To validate the effectiveness of GENTTP, we collect two datasets for evaluation: a dataset with ground truth labels and a large dataset in the wild. Experimental results show that GENTTP can generate TTPs with high accuracy and efficiency. To demonstrate GENTTP's benefits, we build an LLM-based Chatbot from 3,700+ PyPI malware's TTPs. We further conduct a quantitative analysis of malware's TTPs at a large scale. Our main findings include: (1) many OSS malicious packages share a relatively stable TTP, even with the increasing emergence of malware and attack campaigns, (2) a TTP reflects characteristics of a malware-based attack, and (3) an attacker's intent behind the malware is linked to a TTP.
翻译:如今,开源软件(OSS)生态系统正遭受软件供应链(SSC)攻击的安全威胁。解释型OSS恶意软件在SSC攻击中扮演着至关重要的角色,因为攻击者拥有多种攻击向量来诱骗用户安装恶意软件并执行恶意活动。本文将MITRE ATT&CK提出的战术、技术与程序(TTPs)引入解释型恶意软件分析,以刻画攻击生命周期的不同阶段。具体而言,我们提出了GENTTP,一种用于提取解释型恶意软件包TTP的零样本方法。GENTTP利用大语言模型(LLMs)自动生成TTP,其输入是一个恶意软件包,输出是攻击向量的欺骗性战术和执行战术。为验证GENTTP的有效性,我们收集了两个评估数据集:一个带有真实标签的数据集和一个大规模的真实世界数据集。实验结果表明,GENTTP能够以高准确性和高效率生成TTP。为展示GENTTP的实用价值,我们基于3,700多个PyPI恶意软件的TTP构建了一个基于LLM的聊天机器人。我们进一步对恶意软件的TTP进行了大规模定量分析。我们的主要发现包括:(1)即使恶意软件和攻击活动日益增多,许多OSS恶意软件包仍共享相对稳定的TTP;(2)TTP反映了基于恶意软件的攻击特征;(3)恶意软件背后攻击者的意图与TTP相关联。