Self-reflection enables language agents to iteratively refine solutions, yet often produces repetitive outputs that limit reasoning performance. Recent studies have attempted to address this limitation through various approaches, among which increasing reflective diversity has shown promise. Our empirical analysis reveals a strong positive correlation between reflective diversity and task success, further motivating the need for diverse reflection signals. We introduce ParamMem, a parametric memory module that encodes cross-sample reflection patterns into model parameters, enabling diverse reflection generation through temperature-controlled sampling. Building on this module, we propose ParamAgent, a reflection-based agent framework that integrates parametric memory with episodic and cross-sample memory. Extensive experiments on code generation, mathematical reasoning, and multi-hop question answering demonstrate consistent improvements over state-of-the-art baselines. Further analysis reveals that ParamMem is sample-efficient, enables weak-to-strong transfer across model scales, and supports self-improvement without reliance on stronger external model, highlighting the potential of ParamMem as an effective component for enhancing language agents.
翻译:自我反思使语言智能体能够迭代优化解决方案,但常产生重复性输出,限制了推理性能。近期研究尝试通过多种方法解决这一局限,其中增加反思多样性已显示出潜力。我们的实证分析揭示了反思多样性与任务成功率之间的强正相关性,进一步强调了获取多样化反思信号的必要性。我们提出ParamMem,一种参数化记忆模块,它将跨样本反思模式编码至模型参数中,通过温度控制采样实现多样化反思生成。基于此模块,我们构建了ParamAgent——一个集成参数化记忆、情景记忆与跨样本记忆的反思型智能体框架。在代码生成、数学推理和多跳问答任务上的大量实验表明,该方法相较现有最优基线模型取得了持续改进。进一步分析显示,ParamMem具有样本高效性,支持不同模型规模间的弱到强知识迁移,并能不依赖外部强模型实现自我改进,这凸显了ParamMem作为增强语言智能体有效组件的潜力。