RebuttalAgent: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Although artificial intelligence (AI) has become deeply integrated into various stages of the research workflow and achieved remarkable advancements, academic rebuttal remains a significant and underexplored challenge. This is because rebuttal is a complex process of strategic communication under severe information asymmetry rather than a simple technical debate. Consequently, current approaches struggle as they largely imitate surface-level linguistics, missing the essential element of perspective-taking required for effective persuasion. In this paper, we introduce RebuttalAgent, the first framework to ground academic rebuttal in Theory of Mind (ToM), operationalized through a ToM-Strategy-Response (TSR) framework that models reviewer mental state, formulates persuasion strategy, and generates evidence-based response. To train our agent, we construct RebuttalBench, a large-scale dataset synthesized via a novel critique-and-refine approach. Our training process consists of two stages, beginning with a supervised fine-tuning phase to equip the agent with ToM-based analysis and strategic planning capabilities, followed by a reinforcement learning phase leveraging the self-reward mechanism for scalable self-improvement. For reliable and efficient automated evaluation, we further develop Rebuttal-RM, a specialized evaluator trained on over 100K samples of multi-source rebuttal data, which achieves scoring consistency with human preferences surpassing powerful judge GPT-4.1. Extensive experiments show RebuttalAgent significantly outperforms the base model by an average of 18.3% on automated metrics, while also outperforming advanced proprietary models across both automated and human evaluations.

翻译：尽管人工智能已深度融入研究流程的各个阶段并取得显著进展，学术反驳仍是一个重要且尚未充分探索的挑战。这是因为反驳是在严重信息不对称下进行的复杂策略性沟通过程，而非简单的技术辩论。因此，现有方法大多仅模仿表层语言特征，难以胜任，因其缺失有效说服所需的关键要素——观点采择。本文提出反驳代理，这是首个将学术反驳建立在心智理论基础上的框架，通过心智理论-策略-响应框架实现操作化，该框架建模审稿人心理状态、制定说服策略并生成基于证据的回应。为训练我们的代理，我们构建了反驳基准数据集，这是一个通过新颖的批判-精炼方法合成的大规模数据集。训练过程包含两个阶段：首先通过监督微调阶段使代理具备基于心智理论的分析与策略规划能力，随后通过强化学习阶段利用自奖励机制实现可扩展的自我改进。为实现可靠高效的自动化评估，我们进一步开发了反驳奖励模型，这是一个基于超过10万条多源反驳数据训练的专业化评估器，其评分与人类偏好的一致性超越了强大的GPT-4.1裁判模型。大量实验表明，反驳代理在自动化指标上平均显著超越基础模型18.3%，同时在自动化与人工评估中均优于先进的专有模型。