Language model training and inference ignore a fundamental linguistic fact -- there is a dependence between multiple sequences of text written by the same person. Prior work has shown that addressing this form of \textit{ecological fallacy} can greatly improve the performance of multiple smaller (~124M) GPT-based models. In this work, we ask if addressing the ecological fallacy by modeling the author's language context with a specific LM task (called HuLM) can provide similar benefits for a larger-scale model, an 8B Llama model. To this end, we explore variants that process an author's language in the context of their other temporally ordered texts. We study the effect of pre-training with this author context using the HuLM objective, as well as using it during fine-tuning with author context (\textit{HuFT:Human-aware Fine-Tuning}). Empirical comparisons show that addressing the ecological fallacy during fine-tuning alone using QLoRA improves the performance of the larger 8B model over standard fine-tuning. Additionally, QLoRA-based continued HuLM pre-training results in a human-aware model generalizable for improved performance over eight downstream tasks with linear task classifier training alone. These results indicate the utility and importance of modeling language in the context of its original generators, the authors.
翻译:语言模型的训练与推理忽略了一个基本的语言学事实——同一作者所写的多个文本序列之间存在依赖关系。先前研究表明,解决这种形式的\textit{生态谬误}能显著提升多个较小规模(约1.24亿参数)GPT基模型的性能。本研究探讨通过特定语言模型任务(称为HuLM)对作者语言上下文进行建模以解决生态谬误,能否为更大规模模型(80亿参数的Llama模型)带来类似增益。为此,我们探索了在作者按时间排序的其他文本上下文中处理其语言的多种变体。我们研究了使用HuLM目标进行作者上下文预训练的效果,以及在微调阶段结合作者上下文的使用方法(\textit{HuFT:人类感知微调})。实证比较表明,仅通过QLoRA在微调阶段解决生态谬误,就能使80亿大模型的表现优于标准微调方法。此外,基于QLoRA的持续HuLM预训练产生的具有人类感知能力的模型,仅通过线性任务分类器训练即可在八项下游任务中获得普遍性性能提升。这些结果证明了在文本原始生成者(作者)的语境中建模语言的有效性与重要性。