COLT: Lightweight Multi-LLM Collaboration through Shared MCTS Reasoning for Model Compilation

Model serving costs dominate AI systems, making compiler optimization essential for scalable deployment. Recent works show that a large language model (LLM) can guide compiler search by reasoning over program structure and optimization history. However, using a single large model throughout the search is expensive, while smaller models are less reliable when used alone. Thus, this paper seeks to answer whether multi-LLM collaborative reasoning relying primarily on small LLMs can match or exceed the performance of a single large model. As such, we propose a lightweight collaborative multi-LLM framework, dubbed COLT, for compiler optimization that enables coordinated reasoning across multiple models within a single Monte Carlo tree search (MCTS) process. A key contribution is the use of a single shared MCTS tree as the collaboration substrate across LLMs, enabling the reuse of transformation prefixes and cross-model value propagation. Hence, we circumvent both heavy internal reasoning mechanisms and conventional agentic machinery that relies on external planners, multiple concurrent LLMs, databases, external memory/versioning of intermediate results, and controllers by simply endogenizing model selection within the lightweight MCTS optimization loop. Every iteration, the acting LLM proposes a joint action: (compiler transformation, model to be queried next). We also introduce a model-aware tree policy that biases search toward smaller models while preserving exploration, and a course-alteration mechanism that escalates to the largest model when the search exhibits persistent regressions attributable to smaller models.

翻译：模型服务成本在AI系统中占据主导地位，使得编译器优化对于可扩展部署至关重要。近期研究表明，大型语言模型（LLM）能够通过推理程序结构和优化历史来指导编译器搜索。然而，在整个搜索过程中使用单一大型模型成本高昂，而单独使用较小模型则可靠性不足。因此，本文旨在探究主要依赖小型LLM的多LLM协作推理能否达到或超越单一大型模型的性能。为此，我们提出了一种轻量级的协作式多LLM框架——COLT，用于编译器优化。该框架能够在单一蒙特卡洛树搜索（MCTS）过程中实现多个模型间的协调推理。其核心贡献在于使用单一的共享MCTS树作为LLM间的协作基础，从而实现了变换前缀的重用和跨模型的价值传播。由此，我们规避了繁重的内部推理机制以及依赖外部规划器、多个并发LLM、数据库、中间结果的外部存储/版本控制和控制器等传统智能体架构，仅需在轻量级的MCTS优化循环内将模型选择内生化。在每次迭代中，执行LLM会提出一个联合动作：（编译器变换，下一个待查询的模型）。我们还引入了一种模型感知的树策略，该策略在保持探索性的同时使搜索偏向于较小模型；以及一种航向修正机制，当搜索因较小模型而持续出现性能衰退时，该机制会升级至使用最大模型。