Trust by Design: Skill Profiles for Transparent, Cost-Aware LLM Routing

How should Large Language Model (LLM) practitioners select the right model for a task without wasting money? We introduce BELLA (Budget-Efficient LLM Selection via Automated skill-profiling), a framework that recommends optimal LLM selection for tasks through interpretable skill-based model selection. Standard benchmarks report aggregate metrics that obscure which specific capabilities a task requires and whether a cheaper model could suffice. BELLA addresses this gap through three stages: (1) decomposing LLM outputs and extract the granular skills required by using critic-based profiling, (2) clustering skills into structured capability matrices, and (3) multi-objective optimization to select the right models to maximize performance while respecting budget constraints. BELLA provides natural-language rationale for recommendations, providing transparency that current black-box routing systems lack. We describe the framework architecture, situate it within the landscape of LLM routing and evaluation, and discuss its application to financial reasoning as a representative domain exhibiting diverse skill requirements and cost-variation across models. Our framework enables practitioners to make principled and cost-performance trade-offs for deploying LLMs.

翻译：大型语言模型（LLM）从业者应如何为任务选择合适的模型，同时避免资金浪费？我们提出BELLA（基于自动化技能画像的预算高效LLM选择框架），该框架通过可解释的基于技能的模型选择，为任务推荐最优的LLM选择方案。现有标准基准报告的是聚合指标，这些指标掩盖了任务具体需要哪些能力以及更经济的模型是否足够胜任。BELLA通过三个阶段解决这一差距：（1）通过基于评判器的画像方法分解LLM输出并提取所需的细粒度技能；（2）将技能聚类为结构化的能力矩阵；（3）进行多目标优化，在尊重预算约束的前提下选择能最大化性能的合适模型。BELLA为推荐提供自然语言的理由说明，提供了当前黑盒路由系统所缺乏的透明度。我们描述了该框架的架构，将其置于LLM路由与评估的研究背景中，并以金融推理这一具有多样化技能需求及模型间成本差异的代表性领域为例讨论了其应用。我们的框架使从业者能够在部署LLM时做出原则性的成本-性能权衡。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

27+阅读 · 2月27日

PlanGenLLMs：大型语言模型规划能力的最新综述

专知会员服务

33+阅读 · 2025年5月18日

【新书】设计大型语言模型应用：一种面向LLMs的整体方法

专知会员服务

56+阅读 · 2025年3月16日

【新书】解码大型语言模型：理解、实现与优化LLM在自然语言处理应用中的全面指南

专知会员服务

49+阅读 · 2024年12月13日