The growing prominence of cryptocurrencies has triggered widespread public engagement and increased speculative activity, particularly on social media platforms. This study introduces a novel classification framework for identifying predictive statements in cryptocurrency-related tweets, focusing on five popular cryptocurrencies: Cardano, Matic, Binance, Ripple, and Fantom. The classification process is divided into two stages: Task 1 involves binary classification to distinguish between Predictive and Non-Predictive statements. Tweets identified as Predictive proceed to Task 2, where they are further categorized as Incremental, Decremental, or Neutral. To build a robust dataset, we combined manual and GPT-based annotation methods and utilized SenticNet to extract emotion features corresponding to each prediction category. To address class imbalance, GPT-generated paraphrasing was employed for data augmentation. We evaluated a wide range of machine learning, deep learning, and transformer-based models across both tasks. The results show that GPT-based balancing significantly enhanced model performance, with transformer models achieving the highest F1-score in Task 1, while traditional machine learning models performed best in Task 2. Furthermore, our emotion analysis revealed distinct emotional patterns associated with each prediction category across the different cryptocurrencies.
翻译:加密货币的日益兴起引发了广泛的公众参与和投机活动的增加,尤其在社交媒体平台上尤为显著。本研究提出了一种新颖的分类框架,用于识别加密货币相关推文中的预测性陈述,聚焦于五种热门加密货币:Cardano、Matic、Binance、Ripple和Fantom。分类过程分为两个阶段:任务1涉及二分类,区分预测性陈述与非预测性陈述。被识别为预测性的推文进入任务2,进一步划分为增量、减量或中性三类。为构建稳健的数据集,我们结合了人工标注和基于GPT的标注方法,并利用SenticNet提取与每种预测类别对应的情感特征。为解决类别不平衡问题,采用GPT生成的改述进行数据增强。我们在两个任务中评估了广泛的机器学习、深度学习及基于Transformer的模型。结果表明,基于GPT的平衡处理显著提升了模型性能:Transformer模型在任务1中获得了最高的F1分数,而传统机器学习模型在任务2中表现最佳。此外,我们的情感分析揭示了不同加密货币中每种预测类别所关联的独特情感模式。