Fuzzing the PHP Interpreter via Dataflow Fusion

PHP, a dominant scripting language in web development, powers a vast range of websites, from personal blogs to major platforms. While existing research primarily focuses on PHP application-level security issues like code injection, memory errors within the PHP interpreter have been largely overlooked. These memory errors, prevalent due to the PHP interpreter's extensive C codebase, pose significant risks to the confidentiality, integrity, and availability of PHP servers. This paper introduces FlowFusion, the first automatic fuzzing framework to detect memory errors in the PHP interpreter. FlowFusion leverages dataflow as an efficient representation of test cases maintained by PHP developers, merging two or more test cases to produce fused test cases with more complex code semantics. Moreover, FlowFusion employs strategies such as test mutation, interface fuzzing, and environment crossover to increase bug finding. In our evaluation, FlowFusion found 158 unknown bugs in the PHP interpreter, with 125 fixed and 11 confirmed. Comparing FlowFusion against the official test suite and a naive test concatenation approach, FlowFusion can detect new bugs that these methods miss, while also achieving greater code coverage. FlowFusion also outperformed state-of-the-art fuzzers AFL++ and Polyglot, covering 24% more lines of code after 24 hours of fuzzing. FlowFusion has gained wide recognition among PHP developers and is now integrated into the official PHP toolchain.

翻译：PHP作为Web开发领域的主导脚本语言，支撑着从个人博客到大型平台的各类网站。现有研究主要关注PHP应用层安全问题（如代码注入），而PHP解释器内部的内存错误长期被忽视。由于PHP解释器庞大的C代码库，此类内存错误普遍存在，对PHP服务器的机密性、完整性和可用性构成严重威胁。本文提出首个用于检测PHP解释器内存错误的自动化模糊测试框架FlowFusion。该框架利用数据流作为PHP开发者维护测试用例的高效表示形式，通过合并两个或多个测试用例生成具有更复杂代码语义的融合测试用例。此外，FlowFusion采用测试变异、接口模糊测试和环境交叉等策略以提升漏洞发现能力。实验评估表明，FlowFusion在PHP解释器中发现了158个未知漏洞，其中125个已被修复，11个获得官方确认。与官方测试套件及朴素测试拼接方法相比，FlowFusion能检测到这些方法遗漏的新漏洞，同时实现更高的代码覆盖率。在24小时模糊测试中，FlowFusion相较于前沿模糊测试工具AFL++和Polyglot实现了24%的代码行覆盖率提升。该框架已获得PHP开发者社区的广泛认可，并被集成至官方PHP工具链中。

相关内容

PHP

关注 296

PHP 是英文超级文本预处理语言（PHP：Hypertext Preprocessor）的缩写。PHP 是一种 HTML 内嵌式的语言，是一种在服务器端执行的嵌入 HTML 文档的脚本语言，语言的风格有类似于 C 语言，被广泛的运用。PHP 具有非常强大的功能，所有的 CGI 的功能 PHP 都能实现，而且支持几乎所有流行的数据库以及操作系统。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日