ChatGPT, a question-and-answer dialogue system based on a large language model, has gained huge popularity since its introduction. Its positive aspects have been reported through many media platforms, and some analyses even showed that ChatGPT achieved a decent grade in professional exams, including the law, medical, and finance domains, adding extra support to the claim that AI now can assist and, even, replace humans in industrial fields. Others, however, doubt its reliability and trustworthiness. In this paper, we investigate ChatGPT's trustworthiness regarding logically consistent behaviours. Our findings suggest that, although ChatGPT seems to achieve an improved language understanding ability, it still fails to generate logically correct predictions frequently. Hence, while it is true that ChatGPT is an impressive and promising new technique, we conclude that its usage in real-world applications without thorough human inspection requires further consideration, especially for risk-sensitive areas.
翻译:ChatGPT是一种基于大语言模型的问答对话系统,自推出以来便广受欢迎。众多媒体平台报道了其积极表现,部分分析甚至显示ChatGPT在法律、医学和金融等专业领域的考试中取得了不错的成绩,这进一步佐证了人工智能已能在工业领域辅助甚至替代人类的论断。然而,也有人对其可靠性和可信度提出质疑。本文从逻辑一致性行为角度探究ChatGPT的可信度。研究结果表明,尽管ChatGPT展现出增强的语言理解能力,但仍频繁出现无法生成逻辑正确预测的情况。因此,尽管ChatGPT确实是一项令人印象深刻且前景广阔的新技术,我们得出结论:在未经人类严格审核的情况下将其应用于实际场景仍需审慎考量,尤其是在高风险敏感领域。