Millions of users are active on social media. To allow users to better showcase themselves and network with others, we explore the auto-generation of social media self-introduction, a short sentence outlining a user's personal interests. While most prior work profiles users with tags (e.g., ages), we investigate sentence-level self-introductions to provide a more natural and engaging way for users to know each other. Here we exploit a user's tweeting history to generate their self-introduction. The task is non-trivial because the history content may be lengthy, noisy, and exhibit various personal interests. To address this challenge, we propose a novel unified topic-guided encoder-decoder (UTGED) framework; it models latent topics to reflect salient user interest, whose topic mixture then guides encoding a user's history and topic words control decoding their self-introduction. For experiments, we collect a large-scale Twitter dataset, and extensive results show the superiority of our UTGED to the advanced encoder-decoder models without topic modeling.
翻译:数百万用户在社交媒体上活跃。为了让用户更好地展示自我并与他人建立联系,我们探索了社交媒体自我介绍的自动生成——即概括用户个人兴趣的简短语句。尽管现有研究主要利用标签(如年龄)对用户进行画像,我们转而研究句子级自我介绍,以提供更自然、更具吸引力的用户相互了解方式。本文利用用户的推文历史生成其自我介绍。该任务具有挑战性,因为历史内容可能冗长、嘈杂且展现多样化的个人兴趣。为解决这一难题,我们提出了一种新颖的统一主题引导式编码器-解码器(UTGED)框架;该框架对潜在主题进行建模以反映用户的显著兴趣,通过主题混合引导用户历史编码,并利用主题词控制自我介绍的解码。实验中,我们收集了一个大规模Twitter数据集,大量结果表明,与未引入主题建模的先进编码器-解码器模型相比,我们的UTGED模型具有明显优越性。