Programmable System Call Security with eBPF

Jinghao Jia,YiFei Zhu,Dan Williams,Andrea Arcangeli,Claudio Canella,Hubertus Franke,Tobin Feldman-Fitzthum,Dimitrios Skarlatos,Daniel Gruss,Tianyin Xu

System call filtering is a widely used security mechanism for protecting a shared OS kernel against untrusted user applications. However, existing system call filtering techniques either are too expensive due to the context switch overhead imposed by userspace agents, or lack sufficient programmability to express advanced policies. Seccomp, Linux's system call filtering module, is widely used by modern container technologies, mobile apps, and system management services. Despite the adoption of the classic BPF language (cBPF), security policies in Seccomp are mostly limited to static allow lists, primarily because cBPF does not support stateful policies. Consequently, many essential security features cannot be expressed precisely and/or require kernel modifications. In this paper, we present a programmable system call filtering mechanism, which enables more advanced security policies to be expressed by leveraging the extended BPF language (eBPF). More specifically, we create a new Seccomp eBPF program type, exposing, modifying or creating new eBPF helper functions to safely manage filter state, access kernel and user state, and utilize synchronization primitives. Importantly, our system integrates with existing kernel privilege and capability mechanisms, enabling unprivileged users to install advanced filters safely. Our evaluation shows that our eBPF-based filtering can enhance existing policies (e.g., reducing the attack surface of early execution phase by up to 55.4% for temporal specialization), mitigate real-world vulnerabilities, and accelerate filters.

翻译：系统调用过滤是一种广泛使用的安全机制，用于保护共享操作系统内核免受不可信用户应用的侵害。然而，现有系统调用过滤技术要么因用户态代理导致的上下文切换开销过高，要么缺乏足够可编程性以表达高级策略。Seccomp（Linux系统调用过滤模块）被现代容器技术、移动应用和系统管理服务广泛使用。尽管采用了经典BPF语言（cBPF），Seccomp的安全策略主要局限于静态允许列表，根本原因在于cBPF不支持有状态策略。因此，许多关键安全功能无法精确表达，且/或需要修改内核。本文提出一种可编程的系统调用过滤机制，通过利用扩展BPF语言（eBPF）实现更高级安全策略的表达。具体来说，我们创建了新的Seccomp eBPF程序类型，通过暴露、修改或创建新的eBPF辅助函数来安全管理过滤状态、访问内核与用户状态及利用同步原语。重要的是，我们的系统与现有内核权限和能力机制集成，允许非特权用户安全安装高级过滤器。评估表明，基于eBPF的过滤能够增强现有策略（例如，在时间特化场景中可将早期执行阶段的攻击面减少高达55.4%），缓解真实世界漏洞，并加速过滤过程。