Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind

Although artificial intelligence (AI) has become deeply integrated into various stages of the research workflow and achieved remarkable advancements, academic rebuttal remains a significant and underexplored challenge. This is because rebuttal is a complex process of strategic communication under severe information asymmetry rather than a simple technical debate. Consequently, current approaches struggle as they largely imitate surface-level linguistics, missing the essential element of perspective-taking required for effective persuasion. In this paper, we introduce RebuttalAgent, the first framework to ground academic rebuttal in Theory of Mind (ToM), operationalized through a ToM-Strategy-Response (TSR) pipeline that models reviewer mental state, formulates persuasion strategy, and generates strategy-grounded response. To train our agent, we construct RebuttalBench, a large-scale dataset synthesized via a novel critique-and-refine approach. Our training process consists of two stages, beginning with a supervised fine-tuning phase to equip the agent with ToM-based analysis and strategic planning capabilities, followed by a reinforcement learning phase leveraging the self-reward mechanism for scalable self-improvement. For reliable and efficient automated evaluation, we further develop Rebuttal-RM, a specialized evaluator trained on over 100K samples of multi-source rebuttal data, which achieves scoring consistency with human preferences surpassing powerful judge GPT-4.1. Extensive experiments show RebuttalAgent significantly outperforms the base model by an average of 18.3% on automated metrics, while also outperforming advanced proprietary models across both automated and human evaluations. Disclaimer: the generated rebuttal content is for reference only to inspire authors and assist in drafting. It is not intended to replace the author's own critical analysis and response.

翻译：尽管人工智能已深度融入研究流程的各个阶段并取得显著进展，学术反驳仍是一个重要且尚未充分探索的挑战。这是因为反驳是在严重信息不对称下进行的复杂策略性沟通过程，而非简单的技术辩论。因此，现有方法大多仅模仿表层语言特征，难以胜任，因为它们缺失了有效说服所需的核心要素——观点采择。本文提出RebuttalAgent，首个将学术反驳建立在心智理论基础上的框架，通过心智理论-策略-响应的三阶段流程实现：建模审稿人心理状态、制定说服策略、生成基于策略的回应。为训练智能体，我们构建了RebuttalBench大规模数据集，该数据集通过新颖的批判-精炼方法合成。训练过程包含两个阶段：首先进行监督微调，使智能体具备基于心智理论的分析与策略规划能力；随后进行强化学习，利用自奖励机制实现可扩展的自我改进。为实现可靠高效的自动化评估，我们进一步开发了Rebuttal-RM专用评估器，该评估器基于超过10万条多源反驳数据进行训练，其评分与人类偏好的一致性超越了强大的GPT-4.1裁判模型。大量实验表明，RebuttalAgent在自动化指标上平均显著超越基础模型18.3%，同时在自动化与人工评估中均优于先进的专有模型。免责声明：生成的反驳内容仅供参考，旨在启发作者并协助起草，并非用于替代作者自身的批判性分析与回应。