LLMEffiChecker: Understanding and Testing Efficiency Degradation of Large Language Models

In this paper, we make the first attempt to understand and test potential computation efficiency robustness in state-of-the-art LLMs. By analyzing the working mechanism and implementation of 20,543 public-accessible LLMs, we observe a fundamental property in LLMs that could be manipulated in an adversarial manner to reduce computation efficiency significantly. Our key motivation is to generate test inputs that could sufficiently delay the generation of EOS such that LLMs would have to go through enough iterations to satisfy the pre-configured threshold. We present \tool, which can work under both white-box setting and black-box setting. In the white-box scenario, \tool develops a gradient-guided technique that searches for a minimal and unnoticeable perturbation at character-level, token-level, and structure-level. In the black-box scenario, \tool employs a causal inference-based approach to find critical tokens and similarly applies three levels of imperceptible perturbation to them. Both the white-box and black-box settings effectively delay the appearance of EOS, compelling these inputs to reach the naturally-unreachable threshold. To demonstrate the effectiveness of \tool, we conduct a systematic evaluation on nine public-available LLMs: Google T5, AllenAI WMT14, Helsinki-NLP translator, Facebook FairSeq, UNICAMP-DL translator, MarianMT, Google FLAN-T5, MBZUAI LaMini-GPT and Salesforce CodeGen. Experimental results show that \tool can increase on average LLMs' response latency and energy consumption by 325\% to 3244\% and 344\% to 3616\%, respectively, by perturbing just one character or token in the input sentence.

翻译：本文首次尝试理解并测试前沿大型语言模型（LLM）潜在的计算效率鲁棒性问题。通过分析20,543个公开可访问的LLM的工作机制与实现方式，我们发现LLM中存在一个基础特性，该特性可能被以对抗性方式操纵，从而显著降低计算效率。我们的核心动机是生成能够充分延迟EOS（结束符）生成的测试输入，迫使LLM必须经过足够迭代次数才能达到预设阈值。我们提出了\tool，该工具可在白盒与黑盒两种设置下工作。在白盒场景中，\tool开发了一种梯度引导技术，可在字符级、词元级和结构级搜索最小化且不易察觉的扰动。在黑盒场景中，\tool采用基于因果推断的方法定位关键词元，并同样施加三个层级的不可感知扰动。白盒与黑盒设置均能有效延迟EOS的出现，迫使这些输入达到自然情况下无法触及的阈值。为验证\tool的有效性，我们对九个公开可用的LLM进行了系统评估：Google T5、AllenAI WMT14、Helsinki-NLP translator、Facebook FairSeq、UNICAMP-DL translator、MarianMT、Google FLAN-T5、MBZUAI LaMini-GPT以及Salesforce CodeGen。实验结果表明，仅需对输入句子中的一个字符或词元施加扰动，\tool平均可将LLM的响应延迟和能耗分别提升325%至3244%与344%至3616%。

相关内容

白盒

关注 0

白盒测试（也称为透明盒测试，玻璃盒测试，透明盒测试和结构测试）是一种软件测试方法，用于测试应用程序的内部结构或功能，而不是其功能（即黑盒测试）。在白盒测试中，系统的内部视角以及编程技能被用来设计测试用例。测试人员选择输入以遍历代码的路径并确定预期的输出。这类似于测试电路中的节点，在线测试（ICT）。白盒测试可以应用于软件测试过程的单元，集成和系统级别。尽管传统的测试人员倾向于将白盒测试视为在单元级别进行的，但如今它已越来越频繁地用于集成和系统测试。它可以测试单元内的路径，集成期间单元之间的路径以及系统级测试期间子系统之间的路径。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日