Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Although artificial intelligence (AI) has become deeply integrated into various stages of the research workflow and achieved remarkable advancements, academic rebuttal remains a significant and underexplored challenge. This is because rebuttal is a complex process of strategic communication under severe information asymmetry rather than a simple technical debate. Consequently, current approaches struggle as they largely imitate surface-level linguistics, missing the essential element of perspective-taking required for effective persuasion. In this paper, we introduce RebuttalAgent, the first framework to ground academic rebuttal in Theory of Mind (ToM), operationalized through a ToM-Strategy-Response (TSR) pipeline that models reviewer mental state, formulates persuasion strategy, and generates strategy-grounded response. To train our agent, we construct RebuttalBench, a large-scale dataset synthesized via a novel critique-and-refine approach. Our training process consists of two stages, beginning with a supervised fine-tuning phase to equip the agent with ToM-based analysis and strategic planning capabilities, followed by a reinforcement learning phase leveraging the self-reward mechanism for scalable self-improvement. For reliable and efficient automated evaluation, we further develop Rebuttal-RM, a specialized evaluator trained on over 100K samples of multi-source rebuttal data, which achieves scoring consistency with human preferences surpassing powerful judge GPT-4.1. Extensive experiments show RebuttalAgent significantly outperforms the base model by an average of 18.3% on automated metrics, while also outperforming advanced proprietary models across both automated and human evaluations. Disclaimer: the generated rebuttal content is for reference only to inspire authors and assist in drafting. It is not intended to replace the author's own critical analysis and response.

翻译：尽管人工智能已深度融入科研工作流的各个环节并取得了显著进展，学术反驳仍是一个重要且尚未充分探索的挑战。这是因为反驳本质上是在严重信息不对称下进行的策略性沟通过程，而非简单的技术辩论。因此，现有方法大多仅模仿表层语言特征，难以胜任，因为它们缺失了有效说服所必需的换位思考这一核心要素。本文提出RebuttalAgent，这是首个将心智理论（Theory of Mind, ToM）作为基础的学术反驳框架，通过一个ToM-策略-响应（TSR）流程实现，该流程对审稿人的心智状态进行建模、制定说服策略并生成基于策略的回应。为训练我们的智能体，我们构建了RebuttalBench，一个通过新颖的“批判-精炼”方法合成的大规模数据集。训练过程包含两个阶段：首先进行监督微调，使智能体具备基于ToM的分析与策略规划能力；随后进行强化学习，利用自奖励机制实现可扩展的自我改进。为实现可靠高效的自动化评估，我们进一步开发了Rebuttal-RM，这是一个在超过10万份多来源反驳数据上训练的专业评估器，其评分与人类偏好的一致性超越了强大的评判模型GPT-4.1。大量实验表明，RebuttalAgent在自动化指标上平均显著优于基线模型18.3%，同时在自动化与人工评估中均超越了先进的专有模型。免责声明：所生成的反驳内容仅供作者参考以启发思路、辅助起草，并非旨在取代作者自身的批判性分析与回应。