Real-world knowledge representation often requires capturing subjective, continuous attributes -- such as political positions -- that conflict with pairwise validation, the widely accepted gold standard for human evaluation. We address this challenge through a dual-scale validation framework applied to political stance prediction in argumentative discourse, combining pointwise and pairwise human annotation. Using 22 language models, we construct a large-scale knowledge base of political position predictions for 23,228 arguments drawn from 30 debates that appeared on the UK politicial television programme \textit{Question Time}. Pointwise evaluation shows moderate human-model agreement (Krippendorff's $α=0.578$), reflecting intrinsic subjectivity, while pairwise validation reveals substantially stronger alignment between human- and model-derived rankings ($α=0.86$ for the best model). This work contributes: (i) a practical validation methodology for subjective continuous knowledge that balances scalability with reliability; (ii) a validated structured argumentation knowledge base enabling graph-based reasoning and retrieval-augmented generation in political domains; and (iii) evidence that ordinal structure can be extracted from pointwise language models predictions from inherently subjective real-world discourse, advancing knowledge representation capabilities for domains where traditional symbolic or categorical approaches are insufficient.
翻译:现实世界中的知识表征常常需要捕捉主观的、连续的属性——例如政治立场——这些属性与广泛接受的人类评估黄金标准即两两验证存在冲突。我们通过应用于论证性话语中政治立场预测的双尺度验证框架来解决这一挑战,该框架结合了点对点和两两人工标注。使用22个语言模型,我们构建了一个大规模知识库,包含从英国政治电视节目《质询时间》中30场辩论抽取的23,228个论证的政治立场预测。点对点评估显示人机间存在中等程度的一致性(Krippendorff's $α=0.578$),反映了内在的主观性;而两两验证则揭示了人类与模型得出的排序之间存在显著更强的对齐性(最佳模型的 $α=0.86$)。本工作的贡献在于:(i)提出了一种针对主观连续知识的实用验证方法,在可扩展性与可靠性之间取得平衡;(ii)构建了一个经过验证的结构化论证知识库,支持政治领域的图推理和检索增强生成;(iii)提供了证据,表明可以从本质上主观的现实世界话语中,从点对点语言模型预测中提取出序数结构,从而提升了传统符号或分类方法不足的领域的知识表征能力。