PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments

Pluralism, the capacity to engage with diverse perspectives without collapsing them into a single viewpoint, is critical for developing large language models that faithfully reflect human heterogeneity. Yet this characteristic has not been carefully examined in the LLM research community and remains absent from most alignment studies. Debate-oriented sources provide a natural entry point for pluralism research. Previous work builds on online debate sources but remains constrained by costly human validation. Other debate-rich platforms such as Reddit and Kialo also offer promising material: Reddit provides linguistic diversity and scale but lacks clear argumentative structure, while Kialo supplies explicit pro/con graphs but remains overly concise and detached from natural discourse. We introduce PERSPECTRA, a pluralist benchmark that integrates the structural clarity of Kialo debate graphs with the linguistic diversity of real Reddit discussions. Using a controlled retrieval-and-expansion pipeline, we construct 3,810 enriched arguments spanning 762 pro/con stances on 100 controversial topics. Each opinion is expanded to multiple naturalistic variants, enabling robust evaluation of pluralism. We initialise three tasks with PERSPECTRA: opinion counting (identifying distinct viewpoints), opinion matching (aligning supporting stances and discourse to source opinions), and polarity check (inferring aggregate stance in mixed discourse). Experiments with state-of-the-art open-source and proprietary LLMs, highlight systematic failures, such as overestimating the number of viewpoints and misclassifying concessive structures, underscoring the difficulty of pluralism-aware understanding and reasoning. By combining diversity with structure, PERSPECTRA establishes the first scalable, configurable benchmark for evaluating how well models represent, distinguish, and reason over multiple perspectives.

翻译：多元主义——即能够接触不同观点而不将其简化为单一视角的能力——对于开发能够忠实反映人类异质性的大语言模型至关重要。然而，这一特性在LLM研究社区中尚未得到仔细审视，并且在大多数对齐研究中仍然缺失。以辩论为导向的资料来源为多元主义研究提供了自然的切入点。先前的工作基于在线辩论资源，但仍受限于昂贵的人工验证。其他富含辩论的平台如Reddit和Kialo也提供了有前景的材料：Reddit提供了语言多样性和规模，但缺乏清晰的论证结构；而Kialo提供了明确的正/反方关系图，但内容过于简洁且脱离自然话语。我们提出了PERSPECTRA，一个融合了Kialo辩论图结构清晰性与真实Reddit讨论语言多样性的多元主义基准。通过一个受控的检索与扩展流程，我们构建了3,810个经过丰富的论点，涵盖100个争议性主题上的762种正/反方立场。每个观点被扩展为多个自然语言变体，从而实现对多元主义的稳健评估。我们基于PERSPECTRA初始化了三个任务：观点计数（识别不同观点）、观点匹配（将支持性立场和论述与源观点对齐）以及极性检查（推断混合论述中的总体立场）。通过对最先进的开源和专有LLM进行实验，我们揭示了系统性的失败，例如高估观点数量和错误分类让步结构，这凸显了具备多元主义意识的理解与推理的难度。通过将多样性与结构相结合，PERSPECTRA建立了首个可扩展、可配置的基准，用于评估模型在表示、区分和推理多种观点方面的表现。