GenKubeSec: LLM-Based Kubernetes Misconfiguration Detection, Localization, Reasoning, and Remediation

A key challenge associated with Kubernetes configuration files (KCFs) is that they are often highly complex and error-prone, leading to security vulnerabilities and operational setbacks. Rule-based (RB) tools for KCF misconfiguration detection rely on static rule sets, making them inherently limited and unable to detect newly-discovered misconfigurations. RB tools also suffer from misdetection, since mistakes are likely when coding the detection rules. Recent methods for detecting and remediating KCF misconfigurations are limited in terms of their scalability and detection coverage, or due to the fact that they have high expertise requirements and do not offer automated remediation along with misconfiguration detection. Novel approaches that employ LLMs in their pipeline rely on API-based, general-purpose, and mainly commercial models. Thus, they pose security challenges, have inconsistent classification performance, and can be costly. In this paper, we propose GenKubeSec, a comprehensive and adaptive, LLM-based method, which, in addition to detecting a wide variety of KCF misconfigurations, also identifies the exact location of the misconfigurations and provides detailed reasoning about them, along with suggested remediation. When empirically compared with three industry-standard RB tools, GenKubeSec achieved equivalent precision (0.990) and superior recall (0.999). When a random sample of KCFs was examined by a Kubernetes security expert, GenKubeSec's explanations as to misconfiguration localization, reasoning and remediation were 100% correct, informative and useful. To facilitate further advancements in this domain, we share the unique dataset we collected, a unified misconfiguration index we developed for label standardization, our experimentation code, and GenKubeSec itself as an open-source tool.

翻译：Kubernetes配置文件（KCFs）面临的一个关键挑战在于其通常高度复杂且易出错，从而导致安全漏洞与运维故障。基于规则（RB）的KCF配置错误检测工具依赖静态规则集，这使其存在固有局限性，无法检测新发现的配置错误。由于编写检测规则时易出错，RB工具还存在误检问题。近期提出的KCF配置错误检测与修复方法在可扩展性、检测覆盖率方面存在不足，或因其对专业知识要求较高且未能在检测同时提供自动化修复。现有在流程中采用大语言模型（LLM）的新方法主要依赖基于API的通用商业模型，因而存在安全风险、分类性能不稳定且成本高昂。本文提出GenKubeSec——一种全面自适应的基于LLM的方法，该方法不仅能检测多种KCF配置错误，还能精确定位错误位置、提供详细推理说明及修复建议。经与三种行业标准RB工具实证对比，GenKubeSec达到同等精度（0.990）及更优召回率（0.999）。经Kubernetes安全专家对随机KCF样本的评估，GenKubeSec在配置错误定位、推理与修复方面的解释均达到100%准确、信息丰富且实用。为促进该领域发展，我们公开了收集的独特数据集、开发的用于标签标准化的统一配置错误索引、实验代码以及GenKubeSec开源工具。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日