Vector-based word representations help countless Natural Language Processing (NLP) tasks capture both semantic and syntactic regularities of the language. In this paper, we present the characteristics of existing word embedding approaches and analyze them with regards to many classification tasks. We categorize the methods into two main groups - Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well. Neural-Network based approaches, on the other hand, can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations. We report experimental results on multiple classification tasks and highlight the scenarios where one approach performs better than the rest.
翻译:基于向量的词表示有助于大量自然语言处理任务捕捉语言的语义与句法规律。本文阐述了现有词嵌入方法的特征,并针对多种分类任务对其进行了分析。我们将这些方法归为两大类——传统方法主要使用矩阵分解生成词表示,难以充分捕捉语言的语义与句法规律;而基于神经网络的方法则能捕捉语言的复杂规律,并在生成的词表示中保留词间关系。我们报告了在多项分类任务上的实验结果,并重点分析了不同方法表现较优的场景。