We investigate whether pre-training exclusively on dialogue data results in formally and functionally apt small language models. Based on this pre-trained llamalogue model, we employ a variety of fine-tuning strategies to enforce "more communicative" text generations by our models. Although our models underperform on most standard BabyLM benchmarks, they excel at dialogue continuation prediction in a minimal pair setting. While PPO fine-tuning has mixed to adversarial effects on our models, DPO fine-tuning further improves their performance on our custom dialogue benchmark.
翻译:本研究探讨了仅基于对话数据进行预训练是否能够产生形式与功能皆宜的小型语言模型。基于此预训练的llamalogue模型,我们采用了多种微调策略以强制模型生成"更具交流性"的文本。尽管我们的模型在大多数标准BabyLM基准测试中表现欠佳,但在最小配对场景的对话延续预测任务中表现出色。虽然PPO微调对我们的模型产生了复杂甚至负面的影响,但DPO微调进一步提升了其在定制对话基准测试中的性能。