Social world knowledge is a key ingredient in effective communication and information processing by humans and machines alike. As of today, there exist many knowledge bases that represent factual world knowledge. Yet, there is no resource that is designed to capture social aspects of world knowledge. We believe that this work makes an important step towards the formulation and construction of such a resource. We introduce SocialVec, a general framework for eliciting low-dimensional entity embeddings from the social contexts in which they occur in social networks. In this framework, entities correspond to highly popular accounts which invoke general interest. We assume that entities that individual users tend to co-follow are socially related, and use this definition of social context to learn the entity embeddings. Similar to word embeddings which facilitate tasks that involve text semantics, we expect the learned social entity embeddings to benefit multiple tasks of social flavor. In this work, we elicited the social embeddings of roughly 200K entities from a sample of 1.3M Twitter users and the accounts that they follow. We employ and gauge the resulting embeddings on two tasks of social importance. First, we assess the political bias of news sources in terms of entity similarity in the social embedding space. Second, we predict the personal traits of individual Twitter users based on the social embeddings of entities that they follow. In both cases, we show advantageous or competitive performance using our approach compared with task-specific baselines. We further show that existing entity embedding schemes, which are fact-based, fail to capture social aspects of knowledge. We make the learned social entity embeddings available to the research community to support further exploration of social world knowledge and its applications.
翻译:社会世界知识是人类与机器实现有效沟通和信息处理的关键要素。目前,已有众多知识库表征事实性世界知识,但尚无专门设计用于捕捉世界知识社会层面的资源。我们认为,本项工作为构建此类资源迈出了重要一步。我们提出SocialVec——一个通用框架,通过实体在社交网络中的社交语境提取低维实体嵌入。在该框架中,实体对应引发普遍兴趣的高人气账户。我们假设个体用户共同关注的实体具有社会相关性,并以此社交语境定义来学习实体嵌入。与文字嵌入可促进涉及文本语义的任务类似,预期所学习的社交实体嵌入能惠及多种社交类任务。本研究从130万Twitter用户及其关注账户的样本中,提取了约20万个实体的社交嵌入。我们在两项具有社会重要性的任务中应用并评估了所得嵌入:首先,通过社交嵌入空间中的实体相似性评估新闻源的政治倾向;其次,基于用户所关注实体的社交嵌入预测个体Twitter用户的个性特征。结果表明,与任务特定基线相比,我们方法在这两项任务中均展现优势或相当性能。此外,我们发现现有基于事实的实体嵌入方案无法捕捉知识的社会层面。我们已向研究社区公开所学社交实体嵌入,以支持对社会世界知识及其应用的进一步探索。