Large language models like GPT-4 have recently demonstrated impressive capabilities in natural language understanding and generation, enabling various applications including translation, essay writing, and chit-chatting. However, there is a concern that they can be misused for malicious purposes, such as fraud or denial-of-service attacks. Therefore, it is crucial to develop methods for detecting whether the party involved in a conversation is a bot or a human. In this paper, we propose a framework named FLAIR, Finding Large Language Model Authenticity via a Single Inquiry and Response, to detect conversational bots in an online manner. Specifically, we target a single question scenario that can effectively differentiate human users from bots. The questions are divided into two categories: those that are easy for humans but difficult for bots (e.g., counting, substitution, and ASCII art reasoning), and those that are easy for bots but difficult for humans (e.g., memorization and computation). Our approach shows different strengths of these questions in their effectiveness, providing a new way for online service providers to protect themselves against nefarious activities and ensure that they are serving real users. We open-sourced our code and dataset on https://github.com/hongwang600/FLAIR and welcome contributions from the community.
翻译:像GPT-4这样的大型语言模型最近在自然语言理解和生成方面展现了令人印象深刻的能力,推动了包括翻译、论文写作和闲聊在内的多种应用。然而,它们可能被滥用于欺诈或拒绝服务攻击等恶意目的,这引发了担忧。因此,开发用于检测对话参与方是机器还是人类的方法至关重要。在本文中,我们提出了一个名为FLAIR的框架(Finding Large Language Model Authenticity via a Single Inquiry and Response),用于在线检测对话机器人。具体而言,我们针对单一问题场景,该场景能有效区分人类用户和机器人。问题分为两类:一类对人类容易但对机器困难(例如计数、替换和ASCII艺术推理),另一类对机器容易但对人类困难(例如记忆和计算)。我们的方法展示了这些问题在有效性上的不同优势,为在线服务提供商提供了一种新途径,以防范恶意活动并确保他们在服务真实用户。我们已在https://github.com/hongwang600/FLAIR开源了代码和数据集,欢迎社区贡献。