Large language models offer opportunities to simulate multi-party deliberation, but realistic modeling remains limited by a lack of speaker-attributed data. Transcripts produced via automatic speech recognition (ASR) assign anonymous speaker labels (e.g., Speaker_1), preventing models from capturing consistent human behavior. This work introduces a reproducible pipeline to transform public Zoom recordings into speaker-attributed transcripts with metadata like persona profiles and pragmatic action tags (e.g., [propose_motion]). We release three local government deliberation datasets: Appellate Court hearings, School Board meetings, and Municipal Council sessions. Fine-tuning LLMs to model specific participants using this "action-aware" data produces a 67% reduction in perplexity and nearly doubles classifier-based performance metrics for speaker fidelity and realism. Turing-style human evaluations show our simulations are often indistinguishable from real deliberations, providing a practical and scalable method for complex realistic civic simulations.
翻译:大语言模型为多方协商模拟提供了机遇,但现实建模仍因缺乏说话者标注数据而受限。通过自动语音识别生成的转录文本仅分配匿名说话者标签(如Speaker_1),导致模型无法捕捉一致的人类行为。本研究提出一种可复现的处理流程,将公开的Zoom会议录像转化为包含角色档案和语用行为标签(如[propose_motion])等元数据的说话者标注转录文本。我们发布了三个地方政府协商数据集:上诉法院听证会、学校董事会会议和市政议会会议。利用这种"行动感知"数据对大语言模型进行微调以建模特定参与者,可使困惑度降低67%,说话者保真度与真实性的分类器性能指标提升近一倍。图灵式人工评估表明,我们的模拟结果常与真实协商难以区分,为复杂现实公民模拟提供了实用且可扩展的方法。