There has been growing interest in applying NLP techniques in the financial domain, however, resources are extremely limited. This paper introduces StockEmotions, a new dataset for detecting emotions in the stock market that consists of 10,000 English comments collected from StockTwits, a financial social media platform. Inspired by behavioral finance, it proposes 12 fine-grained emotion classes that span the roller coaster of investor emotion. Unlike existing financial sentiment datasets, StockEmotions presents granular features such as investor sentiment classes, fine-grained emotions, emojis, and time series data. To demonstrate the usability of the dataset, we perform a dataset analysis and conduct experimental downstream tasks. For financial sentiment/emotion classification tasks, DistilBERT outperforms other baselines, and for multivariate time series forecasting, a Temporal Attention LSTM model combining price index, text, and emotion features achieves the best performance than using a single feature.
翻译:近年来,将自然语言处理技术应用于金融领域引起了广泛关注,但相关资源极为有限。本文提出StockEmotions——一个用于检测股票市场情感的新数据集,包含从金融社交媒体平台StockTwits收集的10,000条英文评论。受行为金融学启发,该数据集定义了12种细粒度情感类别,覆盖投资者情感起伏的完整过程。与现有金融情感数据集不同,StockEmotions提供了投资者情感类别、细粒度情感、表情符号及时序数据等细致特征。为证明该数据集的实用性,我们进行了数据集分析并开展了下游任务实验。在金融情感/情绪分类任务中,DistilBERT优于其他基准模型;而在多元时间序列预测任务中,融合价格指数、文本及情感特征的时序注意力LSTM模型相比单一特征模型取得了最佳性能。