Despite its pivotal role in research experiments, code correctness is often presumed only on the basis of the perceived quality of the results. This comes with the risk of erroneous outcomes and potentially misleading findings. To address this issue, we posit that the current focus on result reproducibility should go hand in hand with the emphasis on coding best practices. We bolster our call to the NLP community by presenting a case study, in which we identify (and correct) three bugs in widely used open-source implementations of the state-of-the-art Conformer architecture. Through comparative experiments on automatic speech recognition and translation in various language settings, we demonstrate that the existence of bugs does not prevent the achievement of good and reproducible results and can lead to incorrect conclusions that potentially misguide future research. In response to this, this study is a call to action toward the adoption of coding best practices aimed at fostering correctness and improving the quality of the developed software.
翻译:尽管代码正确性在研究实验中扮演着关键角色,但人们往往仅依据结果的可感知质量来假定其正确性。这带来了产生错误结果和潜在误导性发现的风险。为解决这一问题,我们认为当前对结果可重复性的关注应与对编码最佳实践的重视齐头并进。通过一个案例研究,我们在广泛使用的开源实现(基于最先进的Conformer架构)中识别并纠正了三个错误,以此向自然语言处理学界发出呼吁。通过在多种语言场景下的自动语音识别和翻译对比实验,我们证明:代码中存在错误并不妨碍获得良好且可重复的结果,但可能导致错误结论,从而潜在误导后续研究。因此,本研究旨在呼吁学界采纳编码最佳实践,以促进代码正确性并提升所开发软件的质量。