Offline black-box optimization (BBO) aims to find optimal designs based solely on an offline dataset of designs and their labels. Such scenarios frequently arise in domains like DNA sequence design and robotics, where only a few labeled data points are available. Traditional methods typically rely on task-specific proxy or generative models, overlooking the in-context learning capabilities of pre-trained large language models (LLMs). Recent efforts have adapted autoregressive LLMs to BBO by framing task descriptions and offline datasets as natural language prompts, enabling direct design generation. However, these designs often contain bidirectional dependencies, which left-to-right models struggle to capture. In this paper, we explore diffusion LLMs for BBO, leveraging their bidirectional modeling and iterative refinement capabilities. This motivates our in-context denoising module: we condition the diffusion LLM on the task description and the offline dataset, both formatted in natural language, and prompt it to denoise masked designs into improved candidates. To guide the generation toward high-performing designs, we introduce masked diffusion tree search, which casts the denoising process as a step-wise Monte Carlo Tree Search that dynamically balances exploration and exploitation. Each node represents a partially masked design, each denoising step is an action, and candidates are evaluated via expected improvement under a Gaussian Process trained on the offline dataset. Our method, dLLM, achieves state-of-the-art results in few-shot settings on design-bench.
翻译:离线黑盒优化旨在仅基于设计及其标签的离线数据集寻找最优设计。此类场景常见于DNA序列设计与机器人学等领域,其中仅有少量标注数据可用。传统方法通常依赖任务特定的代理模型或生成模型,忽视了预训练大语言模型的情境学习能力。近期研究通过将任务描述与离线数据集构建为自然语言提示,使自回归大语言模型适配黑盒优化任务,实现直接设计生成。然而,这些设计常包含双向依赖关系,而左至右模型难以有效捕捉此类特征。本文探索利用扩散大语言模型处理黑盒优化问题,发挥其双向建模与迭代优化能力。由此提出情境去噪模块:将自然语言格式的任务描述与离线数据集作为扩散大语言模型的条件输入,引导模型将掩码设计去噪为改进候选方案。为引导生成高性能设计,提出掩码扩散树搜索方法,将去噪过程构建为逐步蒙特卡洛树搜索,动态平衡探索与利用。每个节点表示部分掩码设计,每次去噪步骤视为动作,候选方案通过基于离线数据集训练的高斯过程计算期望改进度进行评估。所提方法dLLM在design-bench的少样本场景中取得了最先进的性能。