We introduce Similarity-Distance-Magnitude (SDM) language models (LMs), which are sequence prediction models fine-tuned to maximize the proportion of generations in the well-calibrated, high-probability region partitioned by a final-layer SDM activation layer used for binary classification of instruction-following. We demonstrate that existing pre-trained decoder-only Transformer LMs can be readily converted into SDM LMs via supervised fine-tuning, using the final-layer SDM activation layer during training to estimate a change-of-base for a supervised next-token loss over a contrastive input encoding scheme, with additional hard negative examples generated online during training. This results in reduced abstentions (i.e., improved statistical efficiency) compared to strong supervised baselines.
翻译:本文提出相似性-距离-幅度(SDM)语言模型(LMs),这是一种序列预测模型,通过微调以最大化生成文本在经良好校准的高概率区域中的比例,该区域由用于指令跟随二元分类的最终层SDM激活层划分。我们证明,现有的预训练仅解码器Transformer LMs可通过监督微调轻松转换为SDM LMs,在训练期间使用最终层SDM激活层来估计基于对比输入编码方案的监督下一词损失的对数底变换,并辅以训练期间在线生成的额外硬负例。与强监督基线相比,该方法减少了弃权(即提升了统计效率)。