As the Web transitions from static retrieval to generative interaction, the escalating environmental footprint of Large Language Models (LLMs) presents a critical sustainability challenge. Current paradigms indiscriminately apply computation-intensive strategies like Chain-of-Thought (CoT) to billions of daily queries, causing LLM overthinking, a redundancy that amplifies carbon emissions and operational barriers. This inefficiency directly undermines UN Sustainable Development Goals 13 (Climate Action) and 10 (Reduced Inequalities) by hindering equitable AI access in resource-constrained regions. To address this, we introduce EcoThink, an energy-aware adaptive inference framework designed to reconcile high-performance AI intelligence with environmental responsibility. EcoThink employs a lightweight, distillation-based router to dynamically assess query complexity, skipping unnecessary reasoning for factoid retrieval while reserving deep computation for complex logic. Extensive evaluations across 9 diverse benchmarks demonstrate that EcoThink reduces inference energy by 40.4% on average (up to 81.9% for web knowledge retrieval) without statistically significant performance loss. By mitigating algorithmic waste, EcoThink offers a scalable path toward a sustainable, inclusive, and energy-efficient generative AI Agent.
翻译:随着网络从静态检索向生成式交互转型,大语言模型(LLMs)日益加剧的环境足迹已成为关键可持续发展挑战。当前范式对每日数十亿查询不加区分地应用链式推理(Chain-of-Thought, CoT)等计算密集型策略,导致LLM过度思考——这种冗余行为放大了碳排放并加剧了操作壁垒。该效率问题通过阻碍资源受限地区的平等AI访问,直接违背联合国可持续发展目标13(气候行动)与目标10(减少不平等)。为应对此问题,我们提出EcoThink——一种面向节能的自适应推理框架,旨在协调高性能AI智能与环境责任。EcoThink采用轻量级基于蒸馏的路由器动态评估查询复杂度,对事实检索类查询跳过不必要推理,同时为复杂逻辑任务保留深度计算。在9个多样化基准上的广泛评估表明,EcoThink平均降低推理能耗40.4%(网络知识检索任务最高降低81.9%),且未产生统计显著的性能损失。通过缓解算法浪费,EcoThink为构建可持续、包容且节能的生成式AI智能体提供了可扩展路径。