A blind spot is any input to a program that can be arbitrarily mutated without affecting the program's output. Blind spots can be used for steganography or to embed malware payloads. If blind spots overlap file format keywords, they indicate parsing bugs that can lead to exploitable differentials. For example, one could craft a document that renders one way in one viewer and a completely different way in another viewer. They have also been used to circumvent code signing in Android binaries, to coerce certificate authorities to misbehave, and to execute HTTP request smuggling and parameter pollution attacks. This paper formalizes the operational semantics of blind spots, leading to a technique based on dynamic information flow tracking that automatically detects blind spots. An efficient implementation is introduced and evaluated against a corpus of over a thousand diverse PDFs parsed through MuPDF, revealing exploitable bugs in the parser. All of the blind spot classifications are confirmed to be correct and the missed detection rate is no higher than 11%. On average, at least 5% of each PDF file is completely ignored by the parser. Our results show promise that this technique is an efficient automated means to detect exploitable parser bugs, over-permissiveness and differentials. Nothing in the technique is tied to PDF in general, so it can be immediately applied to other notoriously difficult-to-parse formats like ELF, X.509, and XML.
翻译:盲点是指程序中可被任意修改却不影响程序输出的任何输入。盲点可用于隐写术或嵌入恶意软件载荷。若盲点与文件格式关键词重叠,则表明存在解析漏洞,可能导致可利用的差异现象。例如,可构造一份文档,使其在一个查看器中呈现为一种样式,而在另一个查看器中呈现为完全不同的样式。盲点还曾被用于规避Android二进制文件的代码签名、迫使证书颁发机构出现异常行为,以及实施HTTP请求走私和参数污染攻击。本文形式化了盲点的操作语义,提出了一种基于动态信息流追踪的自动检测盲点技术。通过引入高效实现,并在解析器MuPDF处理的超过一千份多样化PDF语料库上进行评估,揭示了解析器中存在的可利用漏洞。所有盲点分类均被确认为正确,漏检率不高于11%。平均而言,每个PDF文件中至少有5%的内容被解析器完全忽略。研究结果表明,该技术是一种能够高效自动检测可利用解析器漏洞、过度宽松性及差异现象的有效手段。该技术本身不依赖于PDF格式,因此可立即应用于其他公认难以解析的格式,如ELF、X.509和XML。