DocTer: Documentation Guided Fuzzing for Testing Deep Learning API Functions

Input constraints are useful for many software development tasks. For example, input constraints of a function enable the generation of valid inputs, i.e., inputs that follow these constraints, to test the function deeper. API functions of deep learning (DL) libraries have DL specific input constraints, which are described informally in the free form API documentation. Existing constraint extraction techniques are ineffective for extracting DL specific input constraints. To fill this gap, we design and implement a new technique, DocTer, to analyze API documentation to extract DL specific input constraints for DL API functions. DocTer features a novel algorithm that automatically constructs rules to extract API parameter constraints from syntactic patterns in the form of dependency parse trees of API descriptions. These rules are then applied to a large volume of API documents in popular DL libraries to extract their input parameter constraints. To demonstrate the effectiveness of the extracted constraints, DocTer uses the constraints to enable the automatic generation of valid and invalid inputs to test DL API functions. Our evaluation on three popular DL libraries (TensorFlow, PyTorch, and MXNet) shows that the precision of DocTer in extracting input constraints is 85.4%. DocTer detects 94 bugs from 174 API functions, including one previously unknown security vulnerability that is now documented in the CVE database, while a baseline technique without input constraints detects only 59 bugs. Most (63) of the 94 bugs are previously unknown, 54 of which have been fixed or confirmed by developers after we report them. In addition, DocTer detects 43 inconsistencies in documents, 39 of which are fixed or confirmed.

翻译：摘要：输入约束对许多软件开发任务非常有用。例如，函数的输入约束能够生成遵循这些约束的有效输入，从而更深入地进行函数测试。深度学习库的API函数具有特定于深度学习的输入约束，这些约束以非正式的格式描述于自由形式的API文档中。现有约束提取技术无法有效提取深度学习特定的输入约束。为填补这一空白，我们设计并实现了一种名为DocTer的新技术，通过分析API文档来提取深度学习API函数的特定输入约束。DocTer采用了一种新颖算法，该算法根据API描述的依存句法树中的句法模式，自动构建规则以提取API参数约束。这些规则随后被应用于主流深度学习库的大量API文档中，以提取其输入参数约束。为展示所提取约束的有效性，DocTer利用这些约束自动生成有效和无效输入，以测试深度学习API函数。我们在三个主流深度学习库（TensorFlow、PyTorch和MXNet）上的评估表明，DocTer提取输入约束的精确率达到85.4%。DocTer从174个API函数中检测出94个缺陷，其中包括一个此前未知的安全漏洞（该漏洞已记录在CVE数据库中），而作为对比的不含输入约束的基线技术仅检测出59个缺陷。94个缺陷中的大多数（63个）是此前未知的，其中54个在报告后已被开发者修复或确认。此外，DocTer还检测到43处文档不一致问题，其中39处已被修复或确认。