Author response (rebuttal) writing is a critical stage of scientific peer review that demands substantial author effort. In practice, authors possess domain expertise, author-only information, and response strategies - concrete forms of author expertise and intent - and seek NLP assistance that integrates these signals into author response generation (ARG). Yet this author-in-the-loop paradigm lacks formal NLP formulation and systematic study: no dataset provides fine-grained author signals, existing ARG work lacks author inputs and controls, and no evaluation measures response reflection of author signals and effectiveness in addressing reviewer concerns. To fill these gaps, we introduce (i) Re3Align, the first large-scale dataset of aligned review-response-revision triplets, where revisions proxy author signals; (ii) REspGen, an author-in-the-loop ARG framework supporting flexible author input, multi-attribute control, and evaluation-guided refinement; and (iii) REspEval, a comprehensive evaluation suite with 20+ metrics spanning input utilization, controllability, response quality, and discourse. Experiments with SOTA LLMs demonstrate the benefits of author input and evaluation-guided refinement, the impact of input specificity on response quality, and controllability-quality trade-offs. We release our dataset, generation and evaluation tools.
翻译:作者回应(反驳)撰写是科学同行评审的关键阶段,需要作者投入大量精力。在实践中,作者拥有领域专业知识、仅作者知晓的信息以及回应策略——这些是作者专业知识和意图的具体体现——并寻求能够将这些信号融入作者回应生成(ARG)的自然语言处理辅助。然而,这种作者在环范式缺乏系统的自然语言处理建模与研究:没有数据集提供细粒度的作者信号,现有的ARG工作缺乏作者输入与控制,也没有评估指标衡量回应对作者信号的反映程度及其在解决评审者关切方面的有效性。为填补这些空白,我们提出:(i)Re3Align,首个大规模的对齐评审-回应-修订三元组数据集,其中修订版代理作者信号;(ii)REspGen,一个支持灵活作者输入、多属性控制和评估引导优化的作者在环ARG框架;(iii)REspEval,包含20多个指标的综合评估套件,涵盖输入利用、可控性、回应质量和话语维度。基于最先进大语言模型的实验表明,作者输入与评估引导优化具有优势,输入特异性影响回应质量,且存在可控性与质量之间的权衡。我们发布了数据集、生成工具与评估工具。