Software systems are increasingly relying on deep learning components, due to their remarkable capability of identifying complex data patterns and powering intelligent behaviour. A core enabler of this change in software development is the availability of easy-to-use deep learning libraries. Libraries like PyTorch and TensorFlow empower a large variety of intelligent systems, offering a multitude of algorithms and configuration options, applicable to numerous domains of systems. However, bugs in those popular deep learning libraries also may have dire consequences for the quality of systems they enable; thus, it is important to understand how bugs are identified and fixed in those libraries. Inspired by a study of Jia et al., which investigates the bug identification and fixing process at TensorFlow, we characterize bugs in the PyTorch library, a very popular deep learning framework. We investigate the causes and symptoms of bugs identified during PyTorch's development, and assess their locality within the project, and extract patterns of bug fixes. Our results highlight that PyTorch bugs are more like traditional software projects bugs, than related to deep learning characteristics. Finally, we also compare our results with the study on TensorFlow, highlighting similarities and differences across the bug identification and fixing process.
翻译:软件系统日益依赖深度学习组件,因其在识别复杂数据模式和驱动智能行为方面表现出卓越能力。这一软件开发变革的核心推动力在于易用型深度学习库的普及。PyTorch与TensorFlow等库为众多智能系统提供支持,通过丰富的算法与配置选项,可应用于系统领域的多个场景。然而,这些流行深度学习库中的缺陷可能对支撑系统的质量造成严重影响,因此理解此类库中缺陷的识别与修复机制至关重要。受Jia等人针对TensorFlow缺陷识别与修复过程研究的启发,本文对极受欢迎的深度学习框架PyTorch库中的缺陷进行表征。我们探究了PyTorch开发过程中所发现缺陷的成因与表征,评估其在项目内的局部性,并提取缺陷修复模式。结果表明,PyTorch缺陷更接近传统软件项目缺陷,而非与深度学习特性直接相关。最后,我们将研究结果与TensorFlow相关研究进行对比,揭示了二者在缺陷识别与修复过程中的异同点。