Accurate translation of bug reports is critical for efficient collaboration in global software development. In this study, we conduct the first comprehensive evaluation of machine translation (MT) performance on bug reports, analyzing the capabilities of DeepL, AWS Translate, and ChatGPT using data from the Visual Studio Code GitHub repository, specifically focusing on reports labeled with the english-please tag. To thoroughly assess the accuracy and effectiveness of each system, we employ multiple machine translation metrics, including BLEU, BERTScore, COMET, METEOR, and ROUGE. Our findings indicate that DeepL consistently outperforms the other systems across most automatic metrics, demonstrating strong lexical and semantic alignment. AWS Translate performs competitively, particularly in METEOR, while ChatGPT lags in key metrics. This study underscores the importance of domain adaptation for translating technical texts and offers guidance for integrating automated translation into bug-triaging workflows. Moreover, our results establish a foundation for future research to refine machine translation solutions for specialized engineering contexts. The code and dataset for this paper are available at GitHub: https://github.com/av9ash/gitbugs/tree/main/multilingual.
翻译:准确翻译错误报告对于全球软件开发中的高效协作至关重要。本研究首次对错误报告的机器翻译性能进行全面评估,利用Visual Studio Code GitHub仓库中标记为english-please标签的报告数据,系统分析了DeepL、AWS Translate和ChatGPT的翻译能力。为深入评估各系统的准确性与有效性,我们采用多种机器翻译评价指标,包括BLEU、BERTScore、COMET、METEOR和ROUGE。研究结果表明,在大多数自动评价指标中,DeepL持续优于其他系统,展现出强大的词汇与语义对齐能力。AWS Translate表现具有竞争力,尤其在METEOR指标上表现突出,而ChatGPT在关键指标上相对滞后。本研究强调了领域适应在技术文本翻译中的重要性,并为将自动化翻译集成到错误分诊工作流程提供了实践指导。此外,我们的研究成果为未来在专业工程场景中优化机器翻译解决方案奠定了研究基础。本文代码与数据集已发布于GitHub:https://github.com/av9ash/gitbugs/tree/main/multilingual。