Converting deep learning models between frameworks is a common step to maximize model compatibility across devices and leverage optimization features that may be exclusively provided in one deep learning framework. However, this conversion process may be riddled with bugs, making the converted models either undeployable or problematic, considerably degrading their prediction correctness. We propose an automated approach for fault localization and repair, Fix-Con, during model conversion between deep learning frameworks. Fix-Con is capable of detecting and fixing faults introduced in model input, parameters, hyperparameters, and the model graph during conversion. Fix-Con uses a set of fault types mined from surveying conversion issues raised to localize potential conversion faults in the converted target model, and then repairs them appropriately, e.g. replacing the parameters of the target model with those from the source model. This is done iteratively for every image in the dataset with output label differences between the source model and the converted target model until all differences are resolved. We evaluate the effectiveness of Fix-Con in fixing model conversion bugs of three widely used image recognition models converted across four different deep learning frameworks. Overall, Fix-Con was able to either completely repair, or significantly improve the performance of 14 out of the 15 erroneous conversion cases.
翻译:摘要:在深度学习框架之间转换模型是提升设备兼容性并利用某一框架独有的优化特性的常见步骤。然而,这一转换过程可能充满缺陷,导致转换后的模型无法部署或出现性能问题,严重降低其预测准确性。我们提出了一种自动化故障定位与修复方法Fix-Con,用于深度学习框架间的模型转换。Fix-Con能够检测并修复转换过程中引入的模型输入、参数、超参数及模型图等层面的故障。该方法通过分析转换问题调查结果挖掘故障类型,定位目标模型中潜在的转换故障,并实施针对性修复,例如用源模型的参数替换目标模型参数。针对数据集中每个导致源模型与目标模型输出标签差异的图像,Fix-Con会迭代执行上述过程直至所有差异被消除。我们在四类不同深度学习框架间转换的三个广泛使用的图像识别模型上评估了Fix-Con修复转换缺陷的效果。总体而言,在15个错误转换案例中,Fix-Con成功完全修复或显著改进了其中14个案例的性能。