Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries. Several prior studies have shown that ChatGPT attains remarkable generation ability compared with existing models. However, the quantitative analysis of ChatGPT's understanding ability has been given little attention. In this report, we explore the understanding ability of ChatGPT by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models. We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question-answering tasks. Additionally, by combining some advanced prompting strategies, we show that the understanding ability of ChatGPT can be further improved.
翻译:近期,ChatGPT因能生成流畅且高质量的人类提问响应而备受关注。多项先前研究表明,与现有模型相比,ChatGPT展现出卓越的生成能力。然而,针对ChatGPT理解能力的量化分析却鲜有关注。本报告通过在最流行的GLUE基准上评估ChatGPT,并将其与四种代表性微调BERT类模型进行对比,探索了ChatGPT的理解能力。研究发现:1)ChatGPT在释义与相似度任务上表现不足;2)ChatGPT在推理任务上以显著优势超越所有BERT模型;3)ChatGPT在情感分析和问答任务上与BERT性能相当。此外,通过结合若干先进提示策略,我们证实ChatGPT的理解能力可进一步得到提升。