As Large Language Models (LLMs) gain wider adoption in various contexts, it becomes crucial to ensure they are reasonably safe, consistent, and reliable for an application at hand. This may require probing or auditing them. Probing LLMs with varied iterations of a single question could reveal potential inconsistencies in their knowledge or functionality. However, a tool for performing such audits with simple workflow and low technical threshold is lacking. In this demo, we introduce "AuditLLM," a novel tool designed to evaluate the performance of various LLMs in a methodical way. AuditLLM's core functionality lies in its ability to test a given LLM by auditing it using multiple probes generated from a single question, thereby identifying any inconsistencies in the model's understanding or operation. A reasonably robust, reliable, and consistent LLM should output semantically similar responses for a question asked differently or by different people. Based on this assumption, AuditLLM produces easily interpretable results regarding the LLM's consistencies from a single question that the user enters. A certain level of inconsistency has been shown to be an indicator of potential bias, hallucinations, and other issues. One could then use the output of AuditLLM to further investigate issues with the aforementioned LLM. To facilitate demonstration and practical uses, AuditLLM offers two key modes: (1) Live mode which allows instant auditing of LLMs by analyzing responses to real-time queries; (2) Batch mode which facilitates comprehensive LLM auditing by processing multiple queries at once for in-depth analysis. This tool is beneficial for both researchers and general users, as it enhances our understanding of LLMs' capabilities in generating responses, using a standardized auditing platform.
翻译:随着大型语言模型(LLMs)在各类场景中的广泛应用,确保其具备合理的安全性、一致性和可靠性已成为应用的关键。这往往需要对其开展探测或审计。通过多次迭代同一个问题来探测LLM,可能揭示其知识或功能中的潜在不一致性。然而,目前尚缺乏一种工作流程简单且技术门槛较低的工具来执行此类审计。在本演示中,我们介绍"AuditLLM"——一种旨在系统评估各类LLM性能的新型工具。其核心功能在于:通过从单个问题生成多个探针来审计目标LLM,从而识别模型理解或运行中的不一致性。一个足够稳健、可靠且一致的LLM,应当能对以不同方式或由不同人员提出的同一问题输出语义相似的响应。基于这一假设,AuditLLM能够针对用户输入的单个问题,生成易于解读的LLM一致性分析结果。已有研究表明,一定程度的不一致性可能预示潜在偏见、幻觉及其他问题。用户可依据AuditLLM的输出对前述LLM的缺陷开展进一步研究。为便于演示及实际应用,AuditLLM提供两种核心模式:(1) 实时模式——通过分析即时查询的响应实现LLM的快速审计;(2) 批量模式——通过一次性处理多个查询实现全面的深层次LLM审计。该工具借助标准化审计平台,有助于研究人员及普通用户更深入地理解LLM生成响应的能力。