AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach

As Large Language Models (LLMs) gain wider adoption in various contexts, it becomes crucial to ensure they are reasonably safe, consistent, and reliable for an application at hand. This may require probing or auditing them. Probing LLMs with varied iterations of a single question could reveal potential inconsistencies in their knowledge or functionality. However, a tool for performing such audits with simple workflow and low technical threshold is lacking. In this demo, we introduce "AuditLLM," a novel tool designed to evaluate the performance of various LLMs in a methodical way. AuditLLM's core functionality lies in its ability to test a given LLM by auditing it using multiple probes generated from a single question, thereby identifying any inconsistencies in the model's understanding or operation. A reasonably robust, reliable, and consistent LLM should output semantically similar responses for a question asked differently or by different people. Based on this assumption, AuditLLM produces easily interpretable results regarding the LLM's consistencies from a single question that the user enters. A certain level of inconsistency has been shown to be an indicator of potential bias, hallucinations, and other issues. One could then use the output of AuditLLM to further investigate issues with the aforementioned LLM. To facilitate demonstration and practical uses, AuditLLM offers two key modes: (1) Live mode which allows instant auditing of LLMs by analyzing responses to real-time queries; (2) Batch mode which facilitates comprehensive LLM auditing by processing multiple queries at once for in-depth analysis. This tool is beneficial for both researchers and general users, as it enhances our understanding of LLMs' capabilities in generating responses, using a standardized auditing platform.

翻译：随着大型语言模型（LLMs）在各类场景中的广泛应用，确保其具备合理的安全性、一致性和可靠性已成为应用的关键。这往往需要对其开展探测或审计。通过多次迭代同一个问题来探测LLM，可能揭示其知识或功能中的潜在不一致性。然而，目前尚缺乏一种工作流程简单且技术门槛较低的工具来执行此类审计。在本演示中，我们介绍"AuditLLM"——一种旨在系统评估各类LLM性能的新型工具。其核心功能在于：通过从单个问题生成多个探针来审计目标LLM，从而识别模型理解或运行中的不一致性。一个足够稳健、可靠且一致的LLM，应当能对以不同方式或由不同人员提出的同一问题输出语义相似的响应。基于这一假设，AuditLLM能够针对用户输入的单个问题，生成易于解读的LLM一致性分析结果。已有研究表明，一定程度的不一致性可能预示潜在偏见、幻觉及其他问题。用户可依据AuditLLM的输出对前述LLM的缺陷开展进一步研究。为便于演示及实际应用，AuditLLM提供两种核心模式：(1) 实时模式——通过分析即时查询的响应实现LLM的快速审计；(2) 批量模式——通过一次性处理多个查询实现全面的深层次LLM审计。该工具借助标准化审计平台，有助于研究人员及普通用户更深入地理解LLM生成响应的能力。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日