Language models have recently been shown capable of performing regression tasks wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal auto-regressive sequence models when they are applied to any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoding-based regression is as performant as traditional approaches for tabular regression tasks, while being flexible enough to capture arbitrary distributions, such as in the task of density estimation.
翻译:语言模型最近被证明能够执行回归任务,其中数值预测以解码字符串的形式表示。在这项工作中,我们为这种能力提供了理论基础,并进一步研究了因果自回归序列模型应用于任意特征表示时的效用。我们发现,尽管这些模型以常规方式训练——通过交叉熵损失进行下一标记预测——但基于解码的回归方法在表格回归任务中的性能与传统方法相当,同时足够灵活以捕捉任意分布,例如在密度估计任务中。