Building NLP systems for subjective tasks requires one to ensure their alignment to contrasting human values. We propose the MultiCalibrated Subjective Task Learner framework (MC-STL), which clusters annotations into identifiable human value clusters by three approaches (similarity of annotator rationales, expert-value taxonomies or rater's sociocultural descriptors) and calibrates predictions for each value cluster by learning cluster-specific embeddings. We demonstrate MC-STL on several subjective learning settings, including ordinal, binary, and preference learning predictions, and evaluate it on multiple datasets covering toxic chatbot conversations, offensive social media posts, and human preference alignment. The results show that MC-STL consistently outperforms the baselines that ignore the latent value structure of the annotations, delivering gains in discrimination, value-specific calibration, and disagreement-aware metrics.
翻译:构建面向主观任务的NLP系统需要确保其与多元人类价值观的对齐。我们提出多校准主观任务学习框架(MC-STL),该框架通过三种方法(标注者理由的相似性、专家价值观分类法或评分者社会文化描述符)将标注聚类为可识别的人类价值观簇,并通过学习簇特定嵌入对每个价值观簇的预测进行校准。我们在多种主观学习场景(包括序数、二元及偏好学习预测)中验证MC-STL框架,并在涵盖有毒聊天机器人对话、冒犯性社交媒体帖子及人类偏好对齐的多个数据集上进行评估。结果表明,MC-STL在区分度、价值观特定校准和分歧感知指标上持续优于忽略标注潜在价值结构的基线方法。