Deep Learning (DL) libraries like TensorFlow and Pytorch simplify machine learning (ML) model development but are prone to bugs due to their complex design. Bug-finding techniques exist, but without precise API specifications, they produce many false alarms. Existing methods to mine API specifications lack accuracy. We explore using ML classifiers to determine input validity. We hypothesize that tensor shapes are a precise abstraction to encode concrete inputs and capture relationships of the data. Shape abstraction severely reduces problem dimensionality, which is important to facilitate ML training. Labeled data are obtained by observing runtime outcomes on a sample of inputs and classifiers are trained on sets of labeled inputs to capture API constraints. Our evaluation, conducted over 183 APIs from TensorFlow and Pytorch, shows that the classifiers generalize well on unseen data with over 91% accuracy. Integrating these classifiers into the pipeline of ACETest, a SoTA bug-finding technique, improves its pass rate from ~29% to ~61%. Our findings suggest that ML-enhanced input classification is an important aid to scale DL library testing.
翻译:TensorFlow和Pytorch等深度学习(DL)库简化了机器学习(ML)模型开发,但由于其复杂设计容易产生缺陷。现有缺陷检测技术因缺乏精确的API规范会产生大量误报,而当前API规范挖掘方法的准确性不足。本研究探索使用ML分类器判定输入有效性,提出张量形状可作为编码具体输入并捕获数据关系的精确抽象。形状抽象能显著降低问题维度,这对促进ML训练至关重要。通过观察输入样本的运行时结果获得标注数据,并在标注输入集上训练分类器以捕获API约束。对TensorFlow和Pytorch中183个API的评估表明,分类器在未见数据上泛化能力良好,准确率超过91%。将分类器集成至最先进缺陷检测技术ACETest的流程中,可使其通过率从约29%提升至约61%。研究结果表明,ML增强的输入分类是扩展DL库测试的重要辅助手段。