While Large language models (LLMs) have demonstrated considerable capabilities across various natural language tasks, they often fall short of the performance achieved by domain-specific state-of-the-art models. One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets. However, this method can be both resource and time-intensive, and not applicable to closed-source commercial LLMs. In this paper, we propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA), a method designed to augment the domain-specific capabilities of LLMs by leveraging insights from the response preference of expert models without requiring fine-tuning. Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks. Moreover, LLM with PANDA even outperforms the expert model that being learned on 4 tasks of ScienceWorld. This finding highlights the potential of exploring tuning-free approaches to achieve weak-to-strong generalization.
翻译:虽然大语言模型(LLMs)在各种自然语言任务中展现出了显著能力,但它们往往无法达到领域特定最优模型的性能水平。增强LLMs领域特定能力的一种潜在方法是使用相应数据集对其进行微调。然而,这种方法既耗费资源又耗时,并且不适用于闭源商业LLMs。本文提出了一种名为"通过偏好适配增强大语言模型的领域特定能力"(PANDA)的方法,该方法通过利用专家模型在响应偏好中的见解来增强LLMs的领域特定能力,且无需进行微调。实验结果表明,PANDA在文本分类和交互式决策任务上显著提升了LLMs的领域特定能力。更值得注意的是,在ScienceWorld的4项任务中,采用PANDA的LLM甚至超越了所学习的专家模型。这一发现凸显了探索免调优方法以实现弱到强泛化的潜力。