The distributed representations currently used are dense and uninterpretable, leading to interpretations that themselves are relative, overcomplete, and hard to interpret. We propose a method that transforms these word vectors into reduced syntactic representations. The resulting representations are compact and interpretable allowing better visualization and comparison of the word vectors and we successively demonstrate that the drawn interpretations are in line with human judgment. The syntactic representations are then used to create hierarchical word vectors using an incremental learning approach similar to the hierarchical aspect of human learning. As these representations are drawn from pre-trained vectors, the generation process and learning approach are computationally efficient. Most importantly, we find out that syntactic representations provide a plausible interpretation of the vectors and subsequent hierarchical vectors outperform the original vectors in benchmark tests.
翻译:当前使用的分布式表示具有稠密且不可解释的特性,导致其解释本身具有相对性、过完备性且难以阐释。我们提出一种将这些词向量转化为简化句法表示的方法。所得表示具有紧凑性与可解释性,能够实现词向量更好的可视化与比较分析,我们进一步证明所提取的解释符合人类判断。随后利用这些句法表示,通过模拟人类学习层次特性的增量学习方法构建层次化词向量。由于这些表示源自预训练向量,其生成过程与学习方法具有较高的计算效率。最重要的是,我们发现句法表示能为向量提供合理的解释,且由此构建的层次化向量在基准测试中性能优于原始向量。