Prior work has explored multi-turn interaction and feedback for LLM writing, but evaluations still largely center on prompts and localized feedback, leaving persistent public reception in online communities underexamined. We test whether broadcast community discussion improves stand-up comedy writing in a controlled multi-agent sandbox: in the discussion condition, critic and audience threads are recorded, filtered, stored as social memory, and later retrieved to condition subsequent generations, whereas the baseline omits discussion. Across 50 rounds (250 paired monologues) judged by five expert annotators using A/B preference and a 15-item rubric, discussion wins 75.6% of instances and improves Craft/Clarity (Δ = 0.440) and Social Response (Δ = 0.422), with occasional increases in aggressive humor.
翻译:先前的研究已探索过多轮交互与反馈对大型语言模型写作的影响,但评估仍主要集中于提示词和局部反馈,对在线社区中持续性的公众反响关注不足。本研究通过一个受控的多智能体沙箱环境,检验广播式社区讨论是否能提升单口喜剧创作水平:在讨论条件下,评论家和观众线程被记录、筛选、存储为社交记忆,并在后续生成时被检索以指导模型;而基线条件则省略讨论环节。经过50轮实验(共250组成对独白),由五位专家标注者采用A/B偏好测试和包含15项指标的评分量表进行评估,结果显示讨论条件在75.6%的案例中胜出,并在技巧/清晰度(Δ = 0.440)与社会反响(Δ = 0.422)维度上显著提升,同时偶现攻击性幽默的增强。