Chatbots, the common moniker for collaborative assistants, are Artificial Intelligence (AI) software that enables people to naturally interact with them to get tasks done. Although chatbots have been studied since the dawn of AI, they have particularly caught the imagination of the public and businesses since the launch of easy-to-use and general-purpose Large Language Model-based chatbots like ChatGPT. As businesses look towards chatbots as a potential technology to engage users, who may be end customers, suppliers, or even their own employees, proper testing of chatbots is important to address and mitigate issues of trust related to service or product performance, user satisfaction and long-term unintended consequences for society. This paper reviews current practices for chatbot testing, identifies gaps as open problems in pursuit of user trust, and outlines a path forward.
翻译:聊天机器人(即协作助手的常用称谓)是人工智能(AI)软件,使人们能够与其自然交互以完成任务。尽管自人工智能诞生之初便已开展对聊天机器人的研究,但自ChatGPT等易用且通用的大语言模型聊天机器人面世以来,它们尤其引发了公众和商业界的广泛想象。随着企业将聊天机器人视为与用户(可能是终端客户、供应商甚至其内部员工)互动的潜在技术,对聊天机器人进行适当测试至关重要,以应对并缓解与服务或产品性能、用户满意度及长期社会潜在意外后果相关的信任问题。本文回顾了当前聊天机器人的测试实践,识别了追求用户信任过程中的开放性问题差距,并勾勒了未来发展方向。