Annotating speaker attributes from text is inherently ambiguous, particularly in multilingual settings where demographic and social cues are implicit and culturally variable. We propose a human-large language model (LLM) collaborative re-annotation framework for stabilizing multilingual speaker-attribute labels under practical resource constraints. Starting from a noisy corpus, we use LLMs to surface recurring annotation rationales through iterative interaction with experts, and apply disagreement-focused sampling for targeted re-annotation. Using this framework, we construct WhoSaidIt, a multilingual dataset covering nine speaker-attribute labels. We quantify divergence between original and revised annotations, benchmark recent LLMs, and analyze the effect of explicit rationales on model behavior. Our results reveal substantial cross-lingual differences in annotation decisions and demonstrate both the strengths and limitations of LLMs in speaker-attribute classification.
翻译:从文本中标注说话者属性本质上是模糊的,尤其在多语环境中,人口统计和社会线索隐含且具有文化差异性。我们提出一种人机协作再标注框架,在有限资源约束下稳定多语说话者属性的标注。该框架以含噪语料为起点,通过大语言模型与专家的迭代交互呈现重复出现的标注理由,并应用分歧导向的采样进行针对性再标注。利用该框架,我们构建了涵盖九类说话者属性标签的多语数据集WhoSaidIt。我们量化了原始标签与修订标签的分歧程度,对近期大语言模型进行基准测试,并分析显式理由对模型行为的影响。实验结果揭示了标注决策中显著的跨语言差异,同时展示了大语言模型在说话者属性分类任务中的优势与局限性。