Despite its pivotal role in research experiments, code correctness is often presumed only on the basis of the perceived quality of the results. This comes with the risk of erroneous outcomes and potentially misleading findings. To address this issue, we posit that the current focus on result reproducibility should go hand in hand with the emphasis on coding best practices. We bolster our call to the NLP community by presenting a case study, in which we identify (and correct) three bugs in widely used open-source implementations of the state-of-the-art Conformer architecture. Through comparative experiments on automatic speech recognition and translation in various language settings, we demonstrate that the existence of bugs does not prevent the achievement of good and reproducible results and can lead to incorrect conclusions that potentially misguide future research. In response to this, this study is a call to action toward the adoption of coding best practices aimed at fostering correctness and improving the quality of the developed software.
翻译:尽管代码正确性在研究实验中扮演着关键角色,但人们往往仅凭借结果的可感知质量来假设其正确性。这种做法可能带来错误结果的风险,并导致具有误导性的研究结论。为解决这一问题,我们主张当前对结果可重复性的关注应同时与对编码最佳实践的重视相辅相成。我们通过一个案例研究向自然语言处理学界发出呼吁:在广泛使用的现有最优Conformer架构开源实现中,我们识别出(并修正)了三个漏洞。通过在不同语言设置下进行的自动语音识别与翻译对比实验,我们证明:漏洞的存在并不妨碍获得良好且可重复的结果,却可能导致错误结论,从而潜在误导后续研究。为此,本研究旨在呼吁业界采纳旨在提升正确性及软件质量的编码最佳实践。