Accurate translation of bug reports is critical for efficient collaboration in global software development. In this study, we conduct the first comprehensive evaluation of machine translation (MT) performance on bug reports, analyzing the capabilities of DeepL, AWS Translate, and ChatGPT using data from the Visual Studio Code GitHub repository, specifically focusing on reports labeled with the english-please tag. To thoroughly assess the accuracy and effectiveness of each system, we employ multiple machine translation metrics, including BLEU, BERTScore, COMET, METEOR, and ROUGE. Our findings indicate that DeepL consistently outperforms the other systems across most automatic metrics, demonstrating strong lexical and semantic alignment. AWS Translate performs competitively, particularly in METEOR, while ChatGPT lags in key metrics. This study underscores the importance of domain adaptation for translating technical texts and offers guidance for integrating automated translation into bug-triaging workflows. Moreover, our results establish a foundation for future research to refine machine translation solutions for specialized engineering contexts. The code and dataset for this paper are available at GitHub: https://github.com/av9ash/gitbugs/tree/main/multilingual.
翻译:准确翻译错误报告对于全球软件开发中的高效协作至关重要。在本研究中,我们首次对机器翻译在错误报告上的性能进行了全面评估,利用来自Visual Studio Code GitHub仓库的数据,特别是那些标记有english-please标签的报告,分析了DeepL、AWS Translate和ChatGPT的能力。为了全面评估每个系统的准确性和有效性,我们采用了多种机器翻译指标,包括BLEU、BERTScore、COMET、METEOR和ROUGE。我们的研究结果表明,在大多数自动指标上,DeepL始终优于其他系统,展现出强大的词汇和语义对齐能力。AWS Translate表现具有竞争力,尤其是在METEOR指标上,而ChatGPT在关键指标上则相对落后。本研究强调了领域适应对于翻译技术文本的重要性,并为将自动翻译集成到错误分类工作流程中提供了指导。此外,我们的结果为未来研究在专业工程场景中优化机器翻译解决方案奠定了基础。本文的代码和数据集可在GitHub上获取:https://github.com/av9ash/gitbugs/tree/main/multilingual。