Detecting what content communities value is a foundational challenge for social computing systems -- from feed curation and content ranking to moderation tools and personalized recommendation systems. Yet existing approaches remain fragmented across methodological paradigms, and it remains unclear which methods best capture community-specific notions of value. We introduce VASTU (Value-Aligned Social Toolkit for Online Content Curation), a benchmark and evaluation framework for systematically comparing approaches to detecting community-valued content. VASTU includes a dataset of 75,000 comments from 15 diverse Reddit communities, annotated with community approval labels and rich linguistic features. Using VASTU, we evaluate feature-based models, transformers, prompted and fine-tuned language models under global versus community-specific training regimes. We find that community-specific models consistently outperform global approaches, with fine-tuned transformers achieving the strongest performance (0.72 AUROC). Notably, fine-tuned SLMs (0.65 AUROC) substantially outperform prompted LLMs (0.60 AUROC) despite being 100 times smaller. Counterintuitively, chain-of-thought prompting provides no benefit, and reasoning models perform the worst (0.53 AUROC), suggesting this task requires learning community norms rather than test-time reasoning. By releasing VASTU, we provide a standardized benchmark to advance research on value-aligned sociotechnical systems.
翻译:检测社区所重视的内容是社交计算系统面临的基础性挑战——从信息流策展、内容排序到审核工具和个性化推荐系统皆然。然而,现有方法仍分散在不同的方法论范式中,且尚不清楚何种方法最能捕捉社区特定的价值观念。本文介绍VASTU(面向在线内容策展的价值对齐社交工具包),这是一个用于系统比较社区价值内容检测方法的基准测试与评估框架。VASTU包含一个来自15个多样化Reddit社区的75,000条评论数据集,标注有社区认可标签及丰富的语言特征。利用VASTU,我们评估了基于特征的模型、Transformer模型、提示调优与微调的语言模型在全局训练与社区特定训练模式下的表现。研究发现,社区特定模型始终优于全局方法,其中微调的Transformer模型取得了最佳性能(0.72 AUROC)。值得注意的是,微调的小型语言模型(0.65 AUROC)显著优于提示调优的大型语言模型(0.60 AUROC),尽管其规模小100倍。与直觉相反,思维链提示未带来任何收益,而推理模型表现最差(0.53 AUROC),这表明该任务需要学习社区规范而非测试时推理。通过发布VASTU,我们为推进价值对齐社会技术系统的研究提供了标准化基准。