In recent years, with the rapid development of large language models, serval models such as GPT-4o have demonstrated extraordinary capabilities, surpassing human performance in various language tasks. As a result, many researchers have begun exploring their potential applications in the field of public opinion analysis. This study proposes a novel large-language-models-based method for public opinion event heat level prediction. First, we preprocessed and classified 62,836 Chinese hot event data collected between July 2022 and December 2023. Then, based on each event's online dissemination heat index, we used the MiniBatchKMeans algorithm to automatically cluster the events and categorize them into four heat levels (ranging from low heat to very high heat). Next, we randomly selected 250 events from each heat level, totalling 1,000 events, to build the evaluation dataset. During the evaluation process, we employed various large language models to assess their accuracy in predicting event heat levels in two scenarios: without reference cases and with similar case references. The results showed that GPT-4o and DeepseekV2 performed the best in the latter case, achieving prediction accuracies of 41.4% and 41.5%, respectively. Although the overall prediction accuracy remains relatively low, it is worth noting that for low-heat (Level 1) events, the prediction accuracies of these two models reached 73.6% and 70.4%, respectively. Additionally, the prediction accuracy showed a downward trend from Level 1 to Level 4, which correlates with the uneven distribution of data across the heat levels in the actual dataset. This suggests that with the more robust dataset, public opinion event heat level prediction based on large language models will have significant research potential for the future.
翻译:近年来,随着大语言模型的快速发展,GPT-4o等模型展现出非凡能力,在多项语言任务上超越人类表现。因此,许多研究者开始探索其在舆情分析领域的应用潜力。本研究提出了一种新颖的基于大语言模型的舆情事件热度等级预测方法。首先,我们对2022年7月至2023年12月期间收集的62,836条中文热点事件数据进行了预处理和分类。随后,依据各事件的网络传播热度指数,采用MiniBatchKMeans算法对事件进行自动聚类,将其划分为四个热度等级(从低热度到极高热度)。接着,我们从每个热度等级中随机选取250个事件,共计1,000个事件,构建评估数据集。在评估过程中,我们采用了多种大语言模型,分别在无参考案例和有相似案例参考两种场景下,评估其预测事件热度等级的准确性。结果表明,GPT-4o和DeepseekV2在后一种场景下表现最佳,预测准确率分别达到41.4%和41.5%。尽管整体预测准确率仍相对较低,但值得注意的是,对于低热度(等级1)事件,这两种模型的预测准确率分别达到了73.6%和70.4%。此外,预测准确率从等级1到等级4呈下降趋势,这与实际数据集中各热度等级数据分布不均有关。这表明,在拥有更高质量数据集的情况下,基于大语言模型的舆情事件热度等级预测未来将具有重要的研究潜力。