The standard practice for training large language models involves packing multiple documents together to optimize computational efficiency. However, the impact of this process on the models' capabilities remains largely unexplored. To address this gap, we investigate how different document-packing strategies influence the latent multi-hop reasoning abilities of LLMs. Our findings indicate that packing can improve model performance compared to training on individual documents, at the expense of more compute. To further understand the underlying mechanisms, we conduct an ablation study, identifying key factors that explain the advantages of packing. Ultimately, our research deepens the understanding of LLM training dynamics and provides practical insights for optimizing model development.
翻译:训练大型语言模型的标准实践通常涉及将多个文档打包在一起以优化计算效率。然而,这一过程对模型能力的影响在很大程度上尚未得到充分探究。为填补这一空白,我们研究了不同文档打包策略如何影响LLMs的潜在多跳推理能力。研究结果表明,与基于单个文档的训练相比,打包策略能以更多计算量为代价提升模型性能。为深入理解其内在机制,我们进行了消融实验,识别出解释打包优势的关键因素。最终,本研究深化了对LLM训练动态的理解,并为优化模型开发提供了实践性见解。