We propose a framework that amortizes the cost of inference-time reasoning by converting transient critiques into retrievable guidelines, through a file-based memory system and agent-controlled tool calls. We evaluate this method on the Rubric Feedback Bench, a novel dataset for rubric-based learning. Experiments demonstrate that our augmented LLMs rapidly match the performance of test-time refinement pipelines while drastically reducing inference cost.
翻译:我们提出一种框架,通过基于文件的记忆系统和智能体控制的工具调用,将瞬时性评析转化为可检索的指导原则,从而分摊推理时思考过程的计算成本。我们在Rubric Feedback Bench(一种基于评分标准学习的新型数据集)上对该方法进行评估。实验表明,增强后的大型语言模型能快速达到测试时精炼流程的性能水平,同时显著降低推理成本。