Large language models (LLMs) have demonstrated strong results on a range of NLP tasks. Typically, outputs are obtained via autoregressive sampling from the LLM's underlying distribution. We show that this inference strategy can be suboptimal for a range of tasks and associated evaluation metrics. As a remedy, we propose metric aware LLM inference: a decision theoretic approach optimizing for custom metrics at inference time. We report improvements over baselines on academic benchmarks and publicly available models.
翻译:大语言模型(LLM)已在多项自然语言处理任务上展现出显著成果。通常,其输出是通过从LLM的底层分布中进行自回归采样获得的。我们发现,对于一系列任务及其相关评估指标而言,这种推理策略可能并非最优。为此,我们提出度量感知的大语言模型推理:一种在推理时针对自定义指标进行优化的决策理论方法。我们在学术基准测试和公开可用模型上报告了相比基线方法的改进效果。