Applications are moving away from monolithic designs to microservice and serverless architectures, where fleets of lightweight and independently deployable components run on public clouds. Autoscaling serves as the primary control mechanism for balancing resource utilization and quality of service, yet existing policies are either opaque learned models that require substantial per-deployment training or brittle hand-tuned rules that fail to generalize. We investigate whether large language models can act as universal few-shot resource allocators that adapt across rapidly evolving microservice deployments. We propose ORACL, Optimized Reasoning for Autoscaling via Chain of Thought with LLMs for Microservices, a framework that leverages prior knowledge and chain-of-thought reasoning to diagnose performance regressions and recommend resource allocations. ORACL transforms runtime telemetry, including pods, replicas, CPU and memory usage, latency, service-level objectives, and fault signals, into semantic natural-language state descriptions and invokes an LLM to produce an interpretable intermediate reasoning trace. This reasoning identifies likely root causes, prunes the action space, and issues safe allocation decisions under policy constraints. Experiments on representative open-source microservice workloads show that ORACL improves root-cause identification accuracy by 15 percent, accelerates training by up to 24x, and improves quality of service by 6 percent in short-term scenarios, without deployment-specific retraining.
翻译:应用程序正从单体架构转向微服务和无服务器架构,其中轻量级且可独立部署的组件集群运行在公有云上。自动扩缩容作为平衡资源利用率与服务质量的主要控制机制,现有策略要么是需要大量单次部署训练的不透明学习模型,要么是难以泛化的脆弱人工调优规则。本研究探讨大型语言模型能否作为通用的少样本资源分配器,以适应快速演进的微服务部署。我们提出ORACL(基于大语言模型思维链的微服务自动扩缩容优化推理框架),该框架利用先验知识与思维链推理机制诊断性能衰退并推荐资源分配方案。ORACL将运行时遥测数据(包括容器组、副本、CPU与内存使用率、延迟、服务级别目标及故障信号)转化为语义化的自然语言状态描述,调用大语言模型生成可解释的中间推理轨迹。该推理过程能识别潜在根本原因,剪枝动作空间,并在策略约束下生成安全的分配决策。在典型开源微服务负载上的实验表明:ORACL在无需针对特定部署重新训练的情况下,将根本原因识别准确率提升15%,训练速度最高加快24倍,短期场景下服务质量改善6%。