LiteCoOp: Lightweight Multi-LLM Shared-Tree Reasoning for Model-Serving Compiler Optimizations

LLM-guided compiler optimization has recently shown promise, but existing approaches rely on a single large LLM throughout search, making them expensive and excluding smaller models. We pose the research question: whether heterogeneous LLMs can collaborate during compiler optimization while reducing compilation cost below optimization guided by a single large LLM. Crucially, this must be achieved without introducing overhead from agentic frameworks, which would run counter to the goal of lower compilation cost. To achieve these competing objectives, we introduce LiteCoOp, a lightweight framework that turns the optimization search tree itself into the mechanism for multi-LLM collaboration, enabling heterogeneous models to share progress without external agentic coordination. At each optimization step, LiteCoOp queries one LLM to propose both a compiler transformation and select the LLM to query at the next step. These LLM proposals are recorded in a shared MCTS tree, so all models are invoked serially and yet are informed by each other's decisions. The shared MCTS backpropagates the rewards, allowing progress made by one model to influence later decisions by others. This makes the MCTS tree the collaborative reasoning mechanism itself, avoiding inter-model communication, heavy reasoning traces, or agentic infrastructure. We instantiate this idea with an LLM-aware UCT that biases model selection toward smaller LLMs to reduce cost while still preserving the compiler performance objective. Across diverse GPU and (CPU) benchmarks, LiteCoOp consistently outperforms single-model baselines, with the best results obtained when scaling collaboration to eight heterogeneous LLMs. This eight-model config reduces total compilation time by 1.95x (1.74x), reduces API cost by 4.47x (4.32x), and invokes the largest model for only 23.1% (23.9%) of total calls while demonstrating collaboration scalability.

翻译：LLM引导的编译器优化近期展现出潜力，但现有方法在整个搜索过程中依赖单一大型LLM，导致成本高昂且排除了较小模型。我们提出研究问题：异构LLM能否在编译器优化过程中协同工作，同时将编译成本降低至低于单一大型LLM引导的优化水平？关键挑战在于，必须在避免智能体框架引入额外开销的前提下实现这一目标——这类框架会违背降低编译成本的核心目标。为平衡这些相互矛盾的目标，我们提出LiteCoOp——一种轻量级框架，将优化搜索树本身转化为多LLM协作机制，使异构模型无需外部智能体协调即可共享进展。在每个优化步骤中，LiteCoOp仅查询一个LLM，由其同时提出编译器变换方案并选择下一步待查询的LLM。这些LLM提案被记录在共享的MCTS树中，使得所有模型串行调用却能相互获悉彼此的决策。共享MCTS通过反向传播奖励值，使某个模型的进展能够影响后续其他模型的决策。这使MCTS树本身成为协作推理机制，避免了模型间通信、繁重推理轨迹或智能体基础设施。我们通过一种LLM感知的UCT算法实现该思想，该算法在模型选择时偏向较小LLM以降低成本，同时保持编译器性能目标。在多样化GPU和CPU基准测试中，LiteCoOp始终优于单模型基线，当扩展至八种异构LLM协作时取得最佳结果。该八模型配置将总编译时间减少1.95倍（1.74倍），将API成本降低4.47倍（4.32倍），且最大模型的调用次数仅占总调用的23.1%（23.9%），展现了协作的可扩展性。