Document representation is the core of many NLP tasks on machine understanding. A general representation learned in an unsupervised manner reserves generality and can be used for various applications. In practice, sentiment analysis (SA) has been a challenging task that is regarded to be deeply semantic-related and is often used to assess general representations. Existing methods on unsupervised document representation learning can be separated into two families: sequential ones, which explicitly take the ordering of words into consideration, and non-sequential ones, which do not explicitly do so. However, both of them suffer from their own weaknesses. In this paper, we propose a model that overcomes difficulties encountered by both families of methods. Experiments show that our model outperforms state-of-the-art methods on popular SA datasets and a fine-grained aspect-based SA by a large margin.
翻译:文档表示是许多机器理解自然语言处理任务的核心。以无监督方式学习得到的一般性表示保留了通用性,可用于多种应用场景。实践中,情感分析作为一项与语义深度相关的挑战性任务,常被用于评估通用表示的质量。现有的无监督文档表示学习方法可分为两类:序列方法(显式考虑词语顺序)与非序列方法(不显式考虑词语顺序)。然而,这两类方法均存在各自的局限性。本文提出了一种模型,能够克服这两类方法所遇到的困难。实验表明,该模型在主流情感分析数据集及细粒度方面级情感分析任务上,显著优于当前最先进的方法。