Automatic mainstream hashtag recommendation aims to accurately provide users with concise and popular topical hashtags before publication. Generally, mainstream hashtag recommendation faces challenges in the comprehensive difficulty of newly posted tweets in response to new topics, and the accurate identification of mainstream hashtags beyond semantic correctness. However, previous retrieval-based methods based on a fixed predefined mainstream hashtag list excel in producing mainstream hashtags, but fail to understand the constant flow of up-to-date information. Conversely, generation-based methods demonstrate a superior ability to comprehend newly posted tweets, but their capacity is constrained to identifying mainstream hashtags without additional features. Inspired by the recent success of the retrieval-augmented technique, in this work, we attempt to adopt this framework to combine the advantages of both approaches. Meantime, with the help of the generator component, we could rethink how to further improve the quality of the retriever component at a low cost. Therefore, we propose RetrIeval-augmented Generative Mainstream HashTag Recommender (RIGHT), which consists of three components: 1) a retriever seeks relevant hashtags from the entire tweet-hashtags set; 2) a selector enhances mainstream identification by introducing global signals; and 3) a generator incorporates input tweets and selected hashtags to directly generate the desired hashtags. The experimental results show that our method achieves significant improvements over state-of-the-art baselines. Moreover, RIGHT can be easily integrated into large language models, improving the performance of ChatGPT by more than 10%.
翻译:自动主流话题标签推荐旨在为用户在发布内容前准确提供简洁且热门的主题标签。通常,主流话题标签推荐面临两大挑战:一是针对新话题发布的新推文存在全面性理解困难,二是需要超越语义正确性实现主流话题标签的精准识别。然而,以往基于固定预定义主流标签列表的检索方法虽擅长生成主流标签,却无法理解持续更新的信息流。反之,基于生成的方法虽展现出理解新推文的优越能力,但由于缺乏额外特征支撑,其识别主流标签的能力受到限制。受近期检索增强技术成功的启发,本研究尝试采用该框架以融合两种方法的优势。同时,借助生成器组件,我们得以重新思考如何低成本进一步提升检索器组件的质量。为此,我们提出检索增强的主流话题标签生成推荐器(RIGHT),该框架包含三个组件:1)检索器从全量推文-标签集合中搜寻相关标签;2)选择器通过引入全局信号增强主流标签识别能力;3)生成器融合输入推文与选定标签直接生成目标标签。实验结果表明,本方法相较于当前最优基线取得显著提升。此外,RIGHT可便捷集成至大语言模型,使ChatGPT的性能提升超过10%。