The rising energy demands of machine learning (ML), e.g., implemented in popular variants like retrieval-augmented generation (RAG) systems, have raised significant concerns about their environmental sustainability. While previous research has proposed green tactics for ML-enabled systems, their empirical evaluation within RAG systems remains largely unexplored. This study presents a controlled experiment investigating five practical techniques aimed at reducing energy consumption in RAG systems. Using a production-like RAG system developed at our collaboration partner, the Software Improvement Group, we evaluated the impact of these techniques on energy consumption, latency, and accuracy. Through a total of 9 configurations spanning over 200 hours of trials using the CRAG dataset, we reveal that techniques such as increasing similarity retrieval thresholds, reducing embedding sizes, applying vector indexing, and using a BM25S reranker can significantly reduce energy usage, up to 60% in some cases. However, several techniques also led to unacceptable accuracy decreases, e.g., by up to 30% for the indexing strategies. Notably, finding an optimal retrieval threshold and reducing embedding size substantially reduced energy consumption and latency with no loss in accuracy, making these two techniques truly energy-efficient. We present the first comprehensive, empirical study on energy-efficient design techniques for RAG systems, providing guidance for developers and researchers aiming to build sustainable RAG applications.
翻译:机器学习(ML)日益增长的能耗需求(例如在检索增强生成(RAG)系统等流行变体中的实现)已引发对其环境可持续性的重大关切。尽管先前研究已提出面向ML赋能系统的绿色策略,但这些策略在RAG系统中的实证评估仍基本处于空白。本研究通过受控实验,探究了五种旨在降低RAG系统能耗的实用技术。利用我们在合作方Software Improvement Group开发的类生产环境RAG系统,我们评估了这些技术对能耗、延迟和准确性的影响。通过在CRAG数据集上总计9种配置、超过200小时的试验,我们发现:提高相似度检索阈值、减小嵌入维度、应用向量索引以及使用BM25S重排序器等技术可显著降低能耗,部分案例中降幅高达60%。然而,若干技术也导致了不可接受的准确性下降,例如索引策略的准确性降幅最高达30%。值得注意的是,寻找最优检索阈值与减小嵌入维度能在不损失准确性的前提下大幅降低能耗与延迟,使这两种技术成为真正的高能效方案。我们首次对RAG系统的能效设计技术进行了全面实证研究,为致力于构建可持续RAG应用的开发者和研究者提供了实践指引。