URLs serve as bridges between social media platforms and the broader web, linking user-generated content to external information resources. On Twitter (X), approximately one in five tweets contains at least one URL, underscoring their central role in information dissemination. While prior studies have examined the motivations of authors who share URLs, such author-centered intentions are difficult to observe in practice. To enable broader downstream use, this work investigates reader-centered interpretations, i.e., how users perceive the intentions behind hyperlinks included in posts. We develop an intent taxonomy for including hyperlinks in social posts through a hybrid approach that begins with a bottom-up, data-driven process using large-scale crowdsourced annotations, and is then refined using large language model assistance to generate descriptive category names and precise definitions. The final taxonomy comprises 6 top-level categories and 26 fine-grained intention classes, capturing diverse communicative purposes. Applying this taxonomy, we annotate and analyze 1000 user posts, revealing that advertising, arguing, and sharing are the most prevalent intentions. This resulting taxonomy provides a foundation for intent-aware information retrieval and NLP applications, enabling more accurate retrieval, recommendation, and understanding of social media content.
翻译:URL作为社交媒体平台与更广泛网络之间的桥梁,将用户生成内容与外部信息资源相连接。在Twitter(X)平台上,约五分之一的推文包含至少一个URL,凸显了其在信息传播中的核心作用。虽然先前研究已探讨过URL分享者的动机,但此类以作者为中心的意图在实践中难以观测。为支持更广泛的下游应用,本研究从读者中心视角出发,探究用户如何理解帖子中所含超链接的嵌入意图。通过混合研究方法,我们构建了社交媒体帖子中嵌入超链接的意图分类体系:首先采用自底向上的数据驱动流程进行大规模众包标注,继而借助大语言模型辅助生成描述性类别名称与精确定义。最终形成的分类体系包含6个顶层类别和26个细粒度意图类别,涵盖了多样化的传播目的。应用该分类体系对1000条用户帖子进行标注分析后发现,广告宣传、观点论证和信息共享是最普遍的意图类型。该分类体系为意图感知的信息检索与自然语言处理应用奠定了基础,有助于实现更精准的社交媒体内容检索、推荐与理解。