How should Large Language Model (LLM) practitioners select the right model for a task without wasting money? We introduce BELLA (Budget-Efficient LLM Selection via Automated skill-profiling), a framework that recommends optimal LLM selection for tasks through interpretable skill-based model selection. Standard benchmarks report aggregate metrics that obscure which specific capabilities a task requires and whether a cheaper model could suffice. BELLA addresses this gap through three stages: (1) decomposing LLM outputs and extract the granular skills required by using critic-based profiling, (2) clustering skills into structured capability matrices, and (3) multi-objective optimization to select the right models to maximize performance while respecting budget constraints. BELLA provides natural-language rationale for recommendations, providing transparency that current black-box routing systems lack. We describe the framework architecture, situate it within the landscape of LLM routing and evaluation, and discuss its application to financial reasoning as a representative domain exhibiting diverse skill requirements and cost-variation across models. Our framework enables practitioners to make principled and cost-performance trade-offs for deploying LLMs.
翻译:大型语言模型(LLM)从业者应如何为任务选择合适的模型,同时避免资金浪费?我们提出BELLA(基于自动化技能画像的预算高效LLM选择框架),该框架通过可解释的基于技能的模型选择,为任务推荐最优的LLM选择方案。现有标准基准报告的是聚合指标,这些指标掩盖了任务具体需要哪些能力以及更经济的模型是否足够胜任。BELLA通过三个阶段解决这一差距:(1)通过基于评判器的画像方法分解LLM输出并提取所需的细粒度技能;(2)将技能聚类为结构化的能力矩阵;(3)进行多目标优化,在尊重预算约束的前提下选择能最大化性能的合适模型。BELLA为推荐提供自然语言的理由说明,提供了当前黑盒路由系统所缺乏的透明度。我们描述了该框架的架构,将其置于LLM路由与评估的研究背景中,并以金融推理这一具有多样化技能需求及模型间成本差异的代表性领域为例讨论了其应用。我们的框架使从业者能够在部署LLM时做出原则性的成本-性能权衡。