Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current era of general-purpose large language models (LLMs). Moreover, the recent influx of decoding strategies has further complicated this landscape. This paper provides a comprehensive and multifaceted analysis of various decoding methods within the context of LLMs, evaluating their performance, robustness to hyperparameter changes, and decoding speeds across a wide range of tasks, models, and deployment environments. Our findings reveal that decoding method performance is notably task-dependent and influenced by factors such as alignment, model size, and quantization. Intriguingly, sensitivity analysis exposes that certain methods achieve superior performance at the cost of extensive hyperparameter tuning, highlighting the trade-off between attaining optimal results and the practicality of implementation in varying contexts.
翻译:解码方法在将语言模型从下一词预测器转化为实际任务求解器的过程中发挥着不可或缺的作用。先前关于解码方法的研究主要集中于特定任务模型,其结论可能无法推广至当前通用大语言模型(LLMs)的时代。此外,近期涌现的大量解码策略进一步复杂化了这一研究领域。本文在大语言模型的背景下,对多种解码方法进行了全面且多角度的分析,在广泛的任务、模型和部署环境中评估了它们的性能、对超参数变化的鲁棒性以及解码速度。我们的研究结果表明,解码方法的性能显著依赖于任务,并受到对齐方式、模型规模和量化等因素的影响。有趣的是,敏感性分析揭示,某些方法以大量超参数调优为代价获得了更优的性能,这凸显了在不同情境下取得最优结果与实现方案的实用性之间的权衡。