In this study, we compared the performance of four different methods for multi label text classification using a specific imbalanced business dataset. The four methods we evaluated were fine tuned BERT, Binary Relevance, Classifier Chains, and Label Powerset. The results show that fine tuned BERT outperforms the other three methods by a significant margin, achieving high values of accuracy, F1 Score, Precision, and Recall. Binary Relevance also performs well on this dataset, while Classifier Chains and Label Powerset demonstrate relatively poor performance. These findings highlight the effectiveness of fine tuned BERT for multi label text classification tasks, and suggest that it may be a useful tool for businesses seeking to analyze complex and multifaceted texts.
翻译:在本研究中,我们使用特定的不平衡商业数据集,比较了四种不同方法在多标签文本分类中的性能。评估的四种方法包括微调BERT、二元关联、分类器链和标签幂集。结果表明,微调BERT在准确率、F1分数、精确率和召回率方面均显著优于其他三种方法。二元关联在该数据集上表现良好,而分类器链和标签幂集则表现相对较差。这些发现突显了微调BERT在多标签文本分类任务中的有效性,并表明它可能成为企业分析复杂多层面文本的有用工具。