Chatbots, the common moniker for collaborative assistants, are Artificial Intelligence (AI) software that enables people to naturally interact with them to get tasks done. Although chatbots have been studied since the dawn of AI, they have particularly caught the imagination of the public and businesses since the launch of easy-to-use and general-purpose Large Language Model-based chatbots like ChatGPT. As businesses look towards chatbots as a potential technology to engage users, who may be end customers, suppliers, or even their own employees, proper testing of chatbots is important to address and mitigate issues of trust related to service or product performance, user satisfaction and long-term unintended consequences for society. This paper reviews current practices for chatbot testing, identifies gaps as open problems in pursuit of user trust, and outlines a path forward.
翻译:聊天机器人(chatbots),作为协作助手的通用称谓,是一类人工智能(AI)软件,使用户能够以自然的方式与之交互,从而完成任务。尽管自人工智能诞生之初,聊天机器人便已得到研究,但自基于易用且通用型大语言模型的聊天机器人(如ChatGPT)面世以来,它们尤其激发了公众和商业界的想象。随着企业将聊天机器人视作一种潜在技术,用以吸引用户(这些用户可能是终端客户、供应商,甚至是企业自己的员工),对聊天机器人进行适当的测试至关重要,以应对并缓解与以下方面相关的信任问题:服务或产品性能、用户满意度,以及对社会造成的长期非预期影响。本文回顾了当前聊天机器人测试的实践,指出了在追求用户信任过程中存在的差距作为开放问题,并概述了未来前进的方向。