Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey

Chen Ling,Xujiang Zhao,Jiaying Lu,Chengyuan Deng,Can Zheng,Junxiang Wang,Tanmoy Chowdhury,Yun Li,Hejie Cui,Xuchao Zhang,Tianjiao Zhao,Amit Panalkar,Dhagash Mehta,Stefano Pasquali,Wei Cheng,Haoyu Wang,Yanchi Liu,Zhengzhang Chen,Haifeng Chen,Chris White,Quanquan Gu,Jian Pei,Carl Yang,Liang Zhao

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP), providing a highly useful, task-agnostic foundation for a wide range of applications. However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles, caused by the heterogeneity of domain data, the sophistication of domain knowledge, the uniqueness of domain objectives, and the diversity of the constraints (e.g., various social norms, cultural conformity, religious beliefs, and ethical standards in the domain applications). Domain specification techniques are key to make large language models disruptive in many applications. Specifically, to solve these hurdles, there has been a notable increase in research and practices conducted in recent years on the domain specialization of LLMs. This emerging field of study, with its substantial potential for impact, necessitates a comprehensive and systematic review to better summarize and guide ongoing work in this area. In this article, we present a comprehensive survey on domain specification techniques for large language models, an emerging direction critical for large language model applications. First, we propose a systematic taxonomy that categorizes the LLM domain-specialization techniques based on the accessibility to LLMs and summarizes the framework for all the subcategories as well as their relations and differences to each other. Second, we present an extensive taxonomy of critical application domains that can benefit dramatically from specialized LLMs, discussing their practical significance and open challenges. Last, we offer our insights into the current research status and future trends in this area.

翻译：大型语言模型（LLMs）显著推动了自然语言处理（NLP）领域的发展，为广泛的应用提供了高度实用、任务无关的基础。然而，将LLMs直接应用于解决特定领域的复杂问题面临诸多挑战，这些挑战源于领域数据的异质性、领域知识的复杂性、领域目标的独特性以及约束条件的多样性（例如领域应用中各种社会规范、文化一致性、宗教信仰和伦理标准）。领域专业化技术是使大型语言模型在许多应用中具有颠覆性潜力的关键。具体而言，为应对这些挑战，近年来关于LLMs领域专业化的研究和实践显著增加。这一新兴研究领域因其巨大的影响力潜力，亟需全面系统的综述以更好地总结和指导当前工作。本文对大型语言模型领域专业化技术进行了全面综述，该方向对LLMs应用至关重要。首先，我们提出一个系统性分类法，基于LLMs的可访问性对领域专业化技术进行归类，并总结所有子类别的框架及其相互关系和差异。其次，我们构建了关键应用领域的广泛分类体系，这些领域可从专业化LLMs中显著受益，并讨论其实际意义和开放挑战。最后，我们提出对该领域当前研究现状和未来趋势的见解。