Q&A platforms have been an integral part of the web-help-seeking behavior of programmers over the past decade. However, with the recent introduction of ChatGPT, the paradigm of web-help-seeking behavior is experiencing a shift. Despite the popularity of ChatGPT, no comprehensive study has been conducted to evaluate the characteristics or usability of ChatGPT's answers to software engineering questions. To bridge the gap, we conducted the first in-depth analysis of ChatGPT's answers to 517 Stack Overflow (SO) questions and examined the correctness, consistency, comprehensiveness, and conciseness of ChatGPT's answers. Furthermore, we conducted a large-scale linguistic analysis, and a user study to understand the characteristics of ChatGPT answers from linguistic and human aspects. Our analysis shows that 52\% of ChatGPT answers are incorrect and 77\% are verbose. Nonetheless, ChatGPT answers are still preferred 39.34\% of the time due to their comprehensiveness and well-articulated language style. Our result implies the necessity of close examination and rectification of errors in ChatGPT, at the same time creating awareness among its users of the risks associated with seemingly correct ChatGPT answers.
翻译:问答平台在过去十年中一直是程序员网络求助行为的重要组成部分。然而,随着ChatGPT的近期推出,网络求助行为的范式正在经历转变。尽管ChatGPT广受欢迎,但目前尚无全面研究评估ChatGPT对软件工程问题回答的特性或可用性。为弥补这一空白,我们首次对ChatGPT针对517个Stack Overflow(SO)问题的回答进行了深度分析,考察了ChatGPT回答的正确性、一致性、全面性和简洁性。此外,我们还进行了大规模的语言学分析和用户研究,从语言和人性化角度理解ChatGPT回答的特性。分析显示,52%的ChatGPT回答不正确,77%的回答冗长。尽管如此,由于回答的全面性和表述清晰的语言风格,ChatGPT的回答仍有39.34%的几率被优先选择。我们的结果暗示了密切检查和纠正ChatGPT错误的重要性,同时让用户意识到看似正确的ChatGPT回答所伴随的风险。