This work lists and describes the main recent strategies for building fixed-length, dense and distributed representations for words, based on the distributional hypothesis. These representations are now commonly called word embeddings and, in addition to encoding surprisingly good syntactic and semantic information, have been proven useful as extra features in many downstream NLP tasks.
翻译:本文列举并描述了基于分布假说的词构建固定长度、稠密且分布式表示的主要近期策略。这些表示现在通常被称为词嵌入,除了编码出令人惊讶地好的句法和语义信息外,已被证明在许多下游自然语言处理任务中作为额外特征十分有用。