Text classification is a fundamental task in natural language processing (NLP), and large language models (LLMs) have demonstrated their capability to perform this task across various domains. However, the performance of LLMs heavily depends on the quality of their input prompts. Recent studies have also shown that LLMs exhibit remarkable results in code-related tasks. To leverage the capabilities of LLMs in text classification, we propose the Code Completion Prompt (CoCoP) method, which transforms the text classification problem into a code completion task. CoCoP significantly improves text classification performance across diverse datasets by utilizing LLMs' code-completion capability. For instance, CoCoP enhances the accuracy of the SST2 dataset by more than 20%. Moreover, when CoCoP integrated with LLMs specifically designed for code-related tasks (code models), such as CodeLLaMA, this method demonstrates better or comparable performance to few-shot learning techniques while using only one-tenth of the model size. The source code of our proposed method will be available to the public upon the acceptance of the paper.
翻译:文本分类是自然语言处理(NLP)中的一项基础任务,而大语言模型(LLMs)已展现出在不同领域执行该任务的能力。然而,大语言模型的性能在很大程度上取决于其输入提示的质量。近期研究还表明,大语言模型在代码相关任务中表现出卓越的结果。为了利用大语言模型在文本分类中的能力,我们提出了代码补全提示(CoCoP)方法,该方法将文本分类问题转化为代码补全任务。通过利用大语言模型的代码补全能力,CoCoP显著提升了多种数据集上的文本分类性能。例如,CoCoP将SST2数据集的准确率提升了超过20%。此外,当CoCoP与专为代码相关任务设计的LLMs(代码模型,如CodeLLaMA)结合时,该方法仅使用十分之一的模型规模,即展现出优于或媲美少样本学习技术的性能。我们提出的方法的源代码将在论文被接受后向公众公开。