Recent advancements in Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning, reflection, and tool utilization, unlocking new paradigms for automating complex engineering workflows. However, in the domain of sequential recommendation (SR), tuning models on new datasets still relies heavily on the manual trial-and-error of experienced machine learning engineers. To bridge this gap, we propose \textbf{VirtualMLE}, an LLM-agent framework that leverages the cognitive capabilities of LLMs to organize recommender optimizing into a closed loop of execution, reflection, and memory update. After each trial, the agent explicitly analyzes the observed outcomes and stores concise heuristic feedback in a hierarchical memory system. We evaluate VirtualMLE on three Amazon SR benchmarks with two representative backbones, SASRec and HSTU. VirtualMLE reaches competitive recommendation quality with substantially fewer trials. Furthermore, we observe that cognition summaries distilled from previous datasets can significantly accelerate the search process on unseen datasets, demonstrating the potential of transferring tuning heuristics. Overall, our results provide compelling evidence that LLM agents equipped with reflection and memory can serve as practical virtual engineers to automate and amortize heuristic learning in SR optimization. Our codes are available.
翻译:大型语言模型(LLMs)的最新进展在推理、反思和工具利用方面展现出卓越能力,为复杂工程工作流的自动化开辟了新范式。然而,在序列推荐(SR)领域,对新数据集进行模型调优仍高度依赖经验丰富的机器学习工程师的人工试错。为弥合这一差距,我们提出**VirtualMLE**——一种利用LLMs认知能力将推荐优化组织为"执行-反思-记忆更新"闭环的LLM智能体框架。每次试验后,智能体显式分析观察结果,并将简洁的启发式反馈存储于层级记忆系统中。我们在三个Amazon SR基准上使用SASRec和HSTU两种代表性骨干模型评估了VirtualMLE。该框架能以显著更少的试验次数达到具有竞争力的推荐质量。此外,我们观察到从先前数据集中提炼的认知摘要可显著加速未见数据集的搜索过程,这证明了调优启发式迁移的潜力。总体而言,我们的实验结果有力证明,配备反思与记忆能力的LLM智能体可作为实用的虚拟工程师,在SR优化中实现启发式学习的自动化与摊销。我们的代码已公开。