Anticipating audience reaction towards a certain text is integral to several facets of society ranging from politics, research, and commercial industries. Sentiment analysis (SA) is a useful natural language processing (NLP) technique that utilizes lexical/statistical and deep learning methods to determine whether different-sized texts exhibit positive, negative, or neutral emotions. Recurrent networks are widely used in machine-learning communities for problems with sequential data. However, a drawback of models based on Long-Short Term Memory networks and Gated Recurrent Units is the significantly high number of parameters, and thus, such models are computationally expensive. This drawback is even more significant when the available data are limited. Also, such models require significant over-parameterization and regularization to achieve optimal performance. Tensorized models represent a potential solution. In this paper, we classify the sentiment of some social media posts. We compare traditional recurrent models with their tensorized version, and we show that with the tensorized models, we reach comparable performances with respect to the traditional models while using fewer resources for the training.
翻译:针对特定文本预测受众反应是政治、研究和商业等多个社会领域的重要组成部分。情感分析作为一种有用的自然语言处理技术,利用词汇/统计和深度学习方法来判断不同长度文本所体现的积极、消极或中性情感。循环网络在机器学习社区中被广泛用于处理序列数据问题。然而,基于长短期记忆网络和门控循环单元的模型存在参数数量显著过多的缺点,因此这类模型计算成本高昂。当可用数据有限时,这一缺陷更为突出。此外,此类模型需要大量过参数化和正则化才能达到最优性能。张量化模型提供了一种潜在解决方案。本文对部分社交媒体帖子的情感进行分类,将传统循环模型与其张量化版本进行对比,并证明张量化模型在训练过程中使用更少资源的同时,能达到与传统模型相当的性能。