Argument mining and stance detection are central to understanding how opinions are formed and contested in online discourse. However, most publicly available resources focus on mainstream platforms such as Twitter and Reddit, leaving conversational structure on alt-tech platforms comparatively under-studied. We introduce TruthStance, a large-scale dataset of Truth Social conversation threads spanning 2023-2025, consisting of 24,378 posts and 523,360 comments with reply-tree structure preserved. We provide a human-annotated benchmark of 1,500 instances across argument mining and claim-based stance detection, including inter-annotator agreement, and use it to evaluate large language model (LLM) prompting strategies. Using the best-performing configuration, we release additional LLM-generated labels for 24,352 posts (argument presence) and 107,873 comments (stance to parent), enabling analysis of stance and argumentation patterns across depth, topics, and users. All code and data are released publicly.
翻译:论据挖掘与立场检测是理解在线话语中观点如何形成与交锋的核心任务。然而,当前大多数公开资源集中于Twitter和Reddit等主流平台,导致对另类技术平台(alt-tech)上对话结构的研究相对不足。本文介绍TruthStance——一个大规模Truth Social平台对话线程数据集,时间跨度为2023至2025年,包含24,378条主帖和523,360条评论,完整保留了回复树结构。我们提供了1,500条人工标注样本作为基准,涵盖论据挖掘和基于主张的立场检测任务,包含标注者间一致性分析,并以此评估大语言模型(LLM)提示策略。基于性能最优的配置,我们额外发布了24,352条主帖(论据存在性)和107,873条评论(对父帖立场)的LLM生成标签,支持从对话深度、话题和用户维度分析立场与论证模式。所有代码与数据均已公开。