Short Text Classification (STC) is crucial for processing and comprehending the brief but substantial content prevalent on contemporary digital platforms. The STC encounters difficulties in grasping semantic and syntactic intricacies, an issue that is apparent in traditional pre-trained language models. Although Graph Convolutional Networks enhance performance by integrating external knowledge bases, these methods are limited by the quality and extent of the knowledge applied. Recently, the emergence of Large Language Models (LLMs) and Chain-of-Thought (CoT) has significantly improved the performance of complex reasoning tasks. However, some studies have highlighted the limitations of their application in fundamental NLP tasks. Consequently, this study sought to employ CoT to investigate the capabilities of LLMs in STC tasks. This study introduces Quartet Logic: A Four-Step Reasoning (QLFR) framework. This framework primarily incorporates Syntactic and Semantic Enrichment CoT, effectively decomposing the STC task into four distinct steps: (i) essential concept identification, (ii) common-sense knowledge retrieval, (iii) text rewriting, and (iv) classification. This elicits the inherent knowledge and abilities of LLMs to address the challenges in STC. Surprisingly, we found that QLFR can also improve the performance of smaller models. Therefore, we developed a CoT-Driven Multi-task learning (QLFR-CML) method to facilitate the knowledge transfer from LLMs to smaller models. Extensive experimentation across six short-text benchmarks validated the efficacy of the proposed methods. Notably, QLFR achieved state-of-the-art performance on all datasets, with significant improvements, particularly on the Ohsumed and TagMyNews datasets.
翻译:短文本分类(STC)对于处理和理解当代数字平台上普遍存在的简短但信息密集的内容至关重要。STC在把握语义和句法复杂性方面面临挑战,这一问题在传统预训练语言模型中尤为明显。尽管图卷积网络通过整合外部知识库提升了性能,但这些方法受限于所应用知识的质量和范围。近期,大语言模型(LLMs)和思维链(CoT)的出现显著提升了复杂推理任务的性能。然而,部分研究指出其在基础自然语言处理任务中的应用存在局限性。因此,本研究尝试利用CoT探索LLMs在STC任务中的能力。本文提出四重逻辑:一种四步推理(QLFR)框架。该框架主要整合了句法与语义增强的CoT,将STC任务有效分解为四个步骤:(i)核心概念识别、(ii)常识知识检索、(iii)文本改写和(iv)分类。这一过程激发LLMs的固有知识与能力,以应对STC中的挑战。令人惊讶的是,我们发现QLFR还能提升较小模型的性能。为此,我们开发了一种基于CoT的多任务学习方法(QLFR-CML),以促进知识从LLMs向较小模型的迁移。在六个短文本基准数据集上的大量实验验证了所提方法的有效性。值得注意的是,QLFR在所有数据集上均实现了最先进的性能,尤其在Ohsumed和TagMyNews数据集上取得了显著提升。