Q&A platforms have been crucial for the online help-seeking behavior of programmers. However, the recent popularity of ChatGPT is altering this trend. Despite this popularity, no comprehensive study has been conducted to evaluate the characteristics of ChatGPT's answers to programming questions. To bridge the gap, we conducted the first in-depth analysis of ChatGPT answers to 517 programming questions on Stack Overflow and examined the correctness, consistency, comprehensiveness, and conciseness of ChatGPT answers. Furthermore, we conducted a large-scale linguistic analysis, as well as a user study, to understand the characteristics of ChatGPT answers from linguistic and human aspects. Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose. Nonetheless, our user study participants still preferred ChatGPT answers 35% of the time due to their comprehensiveness and well-articulated language style. However, they also overlooked the misinformation in the ChatGPT answers 39% of the time. This implies the need to counter misinformation in ChatGPT answers to programming questions and raise awareness of the risks associated with seemingly correct answers.
翻译:问答平台一直是程序员在线求助行为的关键支撑。然而,近期ChatGPT的普及正在改变这一趋势。尽管ChatGPT广受欢迎,目前仍缺乏对其编程问题回答特征的系统评估。为填补这一空白,我们首次深入分析了ChatGPT对Stack Overflow上517个编程问题的回答,从正确性、一致性、全面性和简洁性四个维度展开评估。此外,我们通过大规模语言学分析与用户研究,从语言特征和人类认知角度进一步揭示ChatGPT回答的特质。分析显示,52%的ChatGPT回答包含错误信息,77%的回答冗长。尽管如此,参与者仍因ChatGPT回答的全面性与流畅语言风格,在35%的情况下更偏好其回答——但与此同时,他们在39%的情况下未能识别出回答中的错误信息。这揭示了应对ChatGPT编程回答中错误信息的必要性,并需提高对看似正确答案潜在风险的认知。