Software bugs cost the global economy billions of dollars annually and claim ~50\% of the programming time from software developers. Locating these bugs is crucial for their resolution but challenging. It is even more challenging in deep-learning systems due to their black-box nature. Bugs in these systems are also hidden not only in the code but also in the models and training data, which might make traditional debugging methods less effective. In this article, we conduct a large-scale empirical study to better understand the challenges of localizing bugs in deep-learning systems. First, we determine the bug localization performance of four existing techniques using 2,365 bugs from deep-learning systems and 2,913 from traditional software. We found these techniques significantly underperform in localizing deep-learning system bugs. Second, we evaluate how different bug types in deep learning systems impact bug localization. We found that the effectiveness of localization techniques varies with bug type due to their unique challenges. For example, tensor bugs were more accessible to locate due to their structural nature, while all techniques struggled with GPU bugs due to their external dependencies. Third, we investigate the impact of bugs' extrinsic nature on localization in deep-learning systems. We found that deep learning bugs are often extrinsic and thus connected to artifacts other than source code (e.g., GPU, training data), contributing to the poor performance of existing localization methods.
翻译:软件错误每年给全球经济造成数十亿美元的损失,并耗费软件开发人员约50%的编程时间。定位这些错误对其解决至关重要,但极具挑战性。在深度学习系统中,由于其黑箱特性,这一挑战更为严峻。这些系统中的错误不仅隐藏在代码中,还可能存在于模型和训练数据中,这可能导致传统调试方法效果不佳。本文通过大规模实证研究,深入理解深度学习系统中错误定位的挑战。首先,我们利用来自深度学习系统的2365个错误和传统软件的2913个错误,评估了四种现有技术的错误定位性能。研究发现,这些技术在定位深度学习系统错误时表现显著不佳。其次,我们评估了深度学习系统中不同类型的错误对错误定位的影响。研究表明,由于各类错误面临独特挑战,定位技术的有效性随错误类型而变化。例如,张量错误因其结构性特征较易定位,而GPU错误因其外部依赖性使所有技术都难以应对。第三,我们探究了错误的外在性质对深度学习系统定位的影响。研究发现,深度学习错误通常具有外在性,与源代码之外的工件(如GPU、训练数据)相关,这是导致现有定位方法性能低下的重要原因。