Sentiment analysis, widely critiqued for capturing merely the overall tone of a corpus, falls short in accurately reflecting the latent structures and political stances within texts. This study introduces topic metrics, dummy variables converted from extracted topics, as both an alternative and complement to sentiment metrics in stance classification. By employing three datasets identified by Bestvater and Monroe (2023), this study demonstrates BERTopic's proficiency in extracting coherent topics and the effectiveness of topic metrics in stance classification. The experiment results show that BERTopic improves coherence scores by 17.07% to 54.20% when compared to traditional approaches such as Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF), prevalent in earlier political science research. Additionally, our results indicate topic metrics outperform sentiment metrics in stance classification, increasing performance by as much as 18.95%. Our findings suggest topic metrics are especially effective for context-rich texts and corpus where stance and sentiment correlations are weak. The combination of sentiment and topic metrics achieve an optimal performance in most of the scenarios and can further address the limitations of relying solely on sentiment as well as the low coherence score of topic metrics.
翻译:情感分析因仅能捕捉语料的整体基调而受到广泛批评,难以准确反映文本中的潜在结构和政治立场。本研究引入主题度量(从提取主题转换而来的虚拟变量),作为立场分类中情感度量的替代与补充。通过采用Bestvater和Monroe(2023)识别的三个数据集,本研究展示了BERTopic在提取连贯主题方面的能力,以及主题度量在立场分类中的有效性。实验结果表明,与早期政治科学研究中常用的传统方法(如潜在狄利克雷分配(LDA)和非负矩阵分解(NMF))相比,BERTopic将连贯性分数提高了17.07%至54.20%。此外,我们的结果显示,主题度量在立场分类中优于情感度量,性能提升高达18.95%。我们的发现表明,主题度量对上下文丰富的文本以及立场与情感相关性较弱的语料尤为有效。情感度量与主题度量的结合在大多数场景中实现了最优性能,并进一步解决了仅依赖情感度量的局限性以及主题度量低连贯性分数的问题。