In recent years, text classification methods based on neural networks and pre-trained models have gained increasing attention and demonstrated excellent performance. However, these methods still have some limitations in practical applications: (1) They typically focus only on the matching similarity between sentences. However, there exists implicit high-value information both within sentences of the same class and across different classes, which is very crucial for classification tasks. (2) Existing methods such as pre-trained language models and graph-based approaches often consume substantial memory for training and text-graph construction. (3) Although some low-resource methods can achieve good performance, they often suffer from excessively long processing times. To address these challenges, we propose a low-resource and fast text classification model called LFTC. Our approach begins by constructing a compressor list for each class to fully mine the regularity information within intra-class data. We then remove redundant information irrelevant to the target classification to reduce processing time. Finally, we compute the similarity distance between text pairs for classification. We evaluate LFTC on 9 publicly available benchmark datasets, and the results demonstrate significant improvements in performance and processing time, especially under limited computational and data resources, highlighting its superior advantages.
翻译:近年来,基于神经网络与预训练模型的文本分类方法日益受到关注并展现出优异性能。然而,这些方法在实际应用中仍存在若干局限:(1) 通常仅关注句子间的匹配相似度,而同一类别内部及不同类别之间均存在隐含的高价值信息,这对分类任务至关重要。(2) 现有方法如预训练语言模型和图学习方法往往需要消耗大量内存进行训练与文本图构建。(3) 尽管部分低资源方法能取得良好性能,但其处理时间往往过长。为应对这些挑战,我们提出一种名为LFTC的低资源快速文本分类模型。该方法首先为每个类别构建压缩器列表,以充分挖掘类内数据的规律性信息;随后移除与目标分类无关的冗余信息以降低处理时间;最后通过计算文本对之间的相似度距离进行分类。我们在9个公开基准数据集上评估LFTC,结果表明其在性能与处理时间上均有显著提升,尤其在有限计算与数据资源条件下优势更为突出。