Large language models like ChatGPT have recently demonstrated impressive capabilities in natural language understanding and generation, enabling various applications including translation, essay writing, and chit-chatting. However, there is a concern that they can be misused for malicious purposes, such as fraud or denial-of-service attacks. Therefore, it is crucial to develop methods for detecting whether the party involved in a conversation is a bot or a human. In this paper, we propose a framework named FLAIR, Finding Large language model Authenticity via a single Inquiry and Response, to detect conversational bots in an online manner. Specifically, we target a single question scenario that can effectively differentiate human users from bots. The questions are divided into two categories: those that are easy for humans but difficult for bots (e.g., counting, substitution, positioning, noise filtering, and ASCII art), and those that are easy for bots but difficult for humans (e.g., memorization and computation). Our approach shows different strengths of these questions in their effectiveness, providing a new way for online service providers to protect themselves against nefarious activities and ensure that they are serving real users. We open-sourced our dataset on https://github.com/hongwang600/FLAIR and welcome contributions from the community to enrich such detection datasets.
翻译:像ChatGPT这样的大型语言模型近期在自然语言理解与生成方面展现出卓越能力,支持翻译、论文写作、闲聊等多种应用。然而,这些模型可能被滥用于欺诈或拒绝服务攻击等恶意目的,因此开发检测对话参与者是机器人还是人类的方法至关重要。本文提出名为FLAIR(通过单次查询与响应寻找大型语言模型真实性)的框架,实现在线检测对话机器人。具体而言,我们针对单个问题场景进行设计,该场景能有效区分人类用户与机器人。问题分为两类:一类对人类容易但对机器人困难(如计数、替换、定位、噪声过滤和ASCII艺术),另一类对机器人容易但对人类困难(如记忆与计算)。我们的方法展示了这些问题在检测效果上的不同优势,为在线服务提供商提供抵御恶意活动的新途径,确保其服务对象为真实用户。我们在https://github.com/hongwang600/FLAIR上开源了数据集,并欢迎社区贡献以丰富此类检测数据集。