Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases, which can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weakly-supervised feature selection method named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs. We have released the code for reproducibility along with the paper.

翻译：作为自然语言处理中的一项基础任务，词嵌入将每个词语转换为向量空间中的表示。词嵌入面临的一个挑战是，随着词汇量的增长，向量空间的维度会相应增加，这可能导致模型规模急剧膨胀。存储和处理词向量对计算资源要求很高，在移动边缘设备应用中尤为突出。本文探讨了词嵌入的降维问题。为平衡计算成本与性能，我们提出了一种高效且有效的弱监督特征选择方法，命名为WordFS。该方法包含两种变体，每种变体均采用新颖的特征选择准则。在多种任务（如词语与句子相似度计算、二分类与多分类任务）上的实验表明，所提出的WordFS模型在较低计算成本下优于其他降维方法。我们已随论文公开代码以确保可复现性。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日