Real-Time Group Dynamics with LLM Facilitation: Evidence from a Charity Allocation Task

As large language models (LLMs) evolve from single-user assistants to active participants in civic and workplace deliberation, evaluating their effects on collective decision making becomes a governance challenge. We present two empirical studies (N=879) of real-time, text-based group deliberation in an incentive-compatible charity allocation task with real financial stakes ($7,200 USD). Groups of three allocate a donation budget under varying LLM facilitation conditions: Study 1 (N=204) compares three frontier models; Study 2 (N=675) compares facilitator strategies against a no-facilitation baseline. Across both studies, LLM facilitation did not significantly improve group consensus in either study, yet participants consistently preferred facilitated discussion. We additionally identify two governance-relevant risks. First, algorithmic steering: facilitators shifted select charity-level allocations by up to 5.5 percentage points -- directly affecting the final charitable payout -- even when aggregate agreement metrics remained unchanged. Second, an illusion of inclusion: participants cited inclusivity as their primary reason for preferring LLM facilitators, yet neither survey nor transcript-based measures of participation equity improved. Notably, participants reported greater trust in the process under the same conditions where facilitators exerted directional influence on outcomes. Together, these findings show that in AI-mediated group deliberation, perceived procedural improvement can coexist with measurable steering and unchanged participation inequality, motivating evaluation practices that treat collective outcomes, interaction dynamics, and participant perceptions as distinct governance targets.

翻译：摘要：随着大语言模型从单用户助手发展为公民及职场 deliberation 中的活跃参与者，评估其对集体决策的影响已成为一项治理挑战。我们开展了两项实证研究（总样本量N=879），采用激励相容的实时文本群体讨论机制，在真实财务激励（7,200美元）下完成慈善资金分配任务。每组三人需在多样化LLM辅助条件下分配捐赠预算：研究一（N=204）比较了三种前沿模型；研究二（N=675）将辅助策略与无辅助基线进行对比。两项研究均显示，LLM辅助并未显著提升群体共识，但参与者始终更偏好有辅助的讨论。我们同时识别出两个治理相关风险：其一为算法引导——尽管总体一致性指标未变，但辅助方仍使特定慈善项目分配额变动高达5.5个百分点，直接影响最终慈善支付额；其二为包容性幻觉——参与者将包容性列为偏好LLM辅助的首要原因，但无论是调查数据还是基于文本记录的参与公平性指标均无改善。值得注意的是，在辅助方对结果产生方向性影响的相同条件下，参与者却报告了更高的流程信任度。这些发现表明，在AI中介的群体讨论中，感知到的程序改进可能与可测量的引导效应及未变的参与不平等共存，这促使我们应将集体成果、互动动态与参与者感知作为独立的治理目标进行评估实践。