State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex code structures and behaviors. To address these limitations, we propose an innovative state machine inference approach powered by Large Language Models (LLMs). Utilizing text-embedding technology, this method allows LLMs to dissect and analyze the intricacies of protocol implementation code. Through targeted prompt engineering, we systematically identify and infer the underlying state machines. Our evaluation across six protocol implementations demonstrates the method's high efficacy, achieving an accuracy rate exceeding 90% and successfully delineating differences on state machines among various implementations of the same protocol. Importantly, integrating this approach with protocol fuzzing has notably enhanced AFLNet's code coverage by 10% over RFCNLP, showcasing the considerable potential of LLMs in advancing network protocol security analysis. Our proposed method not only marks a significant step forward in accurate state machine inference but also opens new avenues for improving the security and reliability of protocol implementations.
翻译:状态机在增强协议分析效能以揭示更多漏洞方面发挥着关键作用。然而,从网络协议实现中推断状态机的任务面临重大挑战。基于动态分析的传统方法常因覆盖范围有限而遗漏关键状态转换,而静态分析在处理复杂代码结构和行为时存在困难。为解决这些局限,我们提出一种创新的状态机推断方法,该方法由大语言模型(LLMs)驱动。通过利用文本嵌入技术,该方法使LLMs能够剖析和解析协议实现代码的复杂性。借助目标导向的提示工程,我们系统地识别并推断出底层状态机。在六个协议实现上的评估表明,该方法具有高效性,准确率超过90%,并成功描绘出同一协议不同实现间状态机的差异。重要的是,将该方法与协议模糊测试相结合,使AFLNet的代码覆盖率相比RFCNLP提升了10%,展现了LLMs在推进网络协议安全分析方面的巨大潜力。我们提出的方法不仅标志着在精确状态机推断方面迈出了重要一步,也为提升协议实现的安全性和可靠性开辟了新途径。