As Large Language Models (LLMs) broaden their capabilities to manage thousands of API calls, they are confronted with complex data operations across vast datasets with significant overhead to the underlying system. In this work, we introduce LLM-dCache to optimize data accesses by treating cache operations as callable API functions exposed to the tool-augmented agent. We grant LLMs the autonomy to manage cache decisions via prompting, seamlessly integrating with existing function-calling mechanisms. Tested on an industry-scale massively parallel platform that spans hundreds of GPT endpoints and terabytes of imagery, our method improves Copilot times by an average of 1.24x across various LLMs and prompting techniques.
翻译:随着大语言模型(LLMs)扩展其管理数千个API调用的能力,它们面临着跨海量数据集执行复杂数据操作的挑战,这给底层系统带来了显著开销。本研究提出LLM-dCache,通过将缓存操作视为可调用API函数暴露给工具增强型智能体,从而优化数据访问。我们授权LLMs通过提示机制自主管理缓存决策,实现与现有函数调用机制的无缝集成。在覆盖数百个GPT端点与TB级图像数据的工业级大规模并行平台上测试表明,我们的方法在不同LLMs和提示技术中平均将Copilot响应时间提升至原有水平的1.24倍。