Due to their ease of use and high accuracy, Word2Vec (W2V) word embeddings enjoy great success in the semantic representation of words, sentences, and whole documents as well as for semantic similarity estimation. However, they have the shortcoming that they are directly extracted from a surface representation, which does not adequately represent human thought processes and also performs poorly for highly ambiguous words. Therefore, we propose Semantic Concept Embeddings (CE) based on the MultiNet Semantic Network (SN) formalism, which addresses both shortcomings. The evaluation on a marketing target group distribution task showed that the accuracy of predicted target groups can be increased by combining traditional word embeddings with semantic CEs.
翻译:由于其易用性和高准确性,Word2Vec词嵌入在单词、句子及整篇文档的语义表示以及语义相似度估计方面取得了巨大成功。然而,它们存在一个缺点:这些词嵌入直接基于表层表示提取,既不能充分体现人的思维过程,又对高度歧义词表现不佳。为此,我们提出基于MultiNet语义网络形式化体系的语义概念嵌入,该方法能同时解决上述两个问题。在市场营销目标群体分布任务上的评估表明,将传统词嵌入与语义概念嵌入相结合,可以提高预测目标群体的准确性。