We present a small study analyzing how prompt token classification loss weighting (PLW) affects the performance of 7B-size LLaMA models fine-tuned on instruction tasks. We recreated Stanford's Alpaca experiment with both LLaMA 1 and LLaMA 2 using multiple instruction datasets. We found that models fine-tuned on our short-completion dataset have a negative quadratic relationship with PLW while models fine-tuned on long-completion datasets were unaffected by PLW.
翻译:我们开展了一项小型研究,分析提示词令牌分类损失加权(PLW)如何影响在指令任务上微调的7B规模LLaMA模型性能。我们使用多个指令数据集重现了斯坦福大学的Alpaca实验(同时采用LLaMA 1和LLaMA 2)。研究发现,基于短补全数据集微调的模型与PLW呈现负二次关系,而基于长补全数据集微调的模型则不受PLW影响。