The advent of Large Language Models (LLMs) represents a notable breakthrough in Natural Language Processing (NLP), contributing to substantial progress in both text comprehension and generation. However, amidst these advancements, it is noteworthy that LLMs often face a limitation in terms of context length extrapolation. Understanding and extending the context length for LLMs is crucial in enhancing their performance across various NLP applications. In this survey paper, we delve into the multifaceted aspects of exploring why it is essential, and the potential transformations that superior techniques could bring to NLP applications. We study the inherent challenges associated with extending context length and present an organized overview of the existing strategies employed by researchers. Additionally, we discuss the intricacies of evaluating context extension techniques and highlight the open challenges that researchers face in this domain. Furthermore, we explore whether there is a consensus within the research community regarding evaluation standards and identify areas where further agreement is needed. This comprehensive survey aims to serve as a valuable resource for researchers, guiding them through the nuances of context length extension techniques and fostering discussions on future advancements in this evolving field.
翻译:大语言模型的出现标志着自然语言处理领域的重要突破,在文本理解与生成方面均取得了显著进展。然而,值得注意的是,这些模型在上下文长度外推方面常面临局限性。理解并扩展大语言模型的上下文长度,对于提升其在各类自然语言处理应用中的性能至关重要。本综述论文深入探讨了上下文长度扩展的多维度问题:为何这一技术至关重要,以及更优技术可能为自然语言处理应用带来的潜在变革。我们研究了扩展上下文长度所固有的挑战,系统梳理了研究人员当前采用的现有策略。此外,本文讨论了上下文扩展技术评估的复杂性,并强调了该领域研究人员面临的开放性挑战。进一步地,我们探讨了研究社区在评估标准方面是否已形成共识,并指出了需要进一步达成一致的领域。本综述旨在为研究人员提供宝贵资源,指导其理解上下文长度扩展技术的细微之处,并促进对这一快速演进领域未来发展的讨论。