Computation methods for solving entropy-regularized reward optimization -- a class of problems widely used for fine-tuning generative models -- have advanced rapidly. Among those, Adjoint Matching (AM, Domingo-Enrich et al., 2025) has proven highly effective in continuous state spaces with differentiable rewards. Transferring these practical successes to discrete generative modeling, however, remains particularly challenging and largely unexplored, mainly due to the drastic shift in generative model classes to discrete state spaces, which are nowhere differentiable. In this work, we propose Discrete Adjoint Matching (DAM) -- a discrete variant of AM for fine-tuning discrete generative models characterized by Continuous-Time Markov Chains, such as diffusion-based large language models. The core of DAM is the introduction of discrete adjoint-an estimator of the optimal solution to the original problem but formulated on discrete domains-from which standard matching frameworks can be applied. This is derived via a purely statistical standpoint, in contrast to the control-theoretic viewpoint in AM, thereby opening up new algorithmic opportunities for general adjoint-based estimators. We showcase DAM's effectiveness on synthetic and mathematical reasoning tasks.
翻译:求解熵正则化奖励优化的计算方法——这类问题广泛用于生成模型的微调——已取得快速发展。其中,伴随匹配(AM, Domingo-Enrich et al., 2025)在连续状态空间且奖励可微的场景中已被证明极为有效。然而,将这些实际成功迁移至离散生成建模领域仍然极具挑战性且很大程度上尚未探索,这主要源于生成模型类别向离散状态空间的急剧转变,而离散状态空间处处不可微。在本工作中,我们提出了离散伴随匹配(DAM)——这是AM的一种离散变体,用于微调以连续时间马尔可夫链为特征的离散生成模型,例如基于扩散的大语言模型。DAM的核心是引入了离散伴随——这是原始问题最优解的一个估计量,但构建于离散域上——从而可以应用标准的匹配框架。这是从一个纯粹的统计学视角推导得出的,与AM中基于控制论的观点形成对比,从而为基于伴随的一般估计量开辟了新的算法可能性。我们在合成任务和数学推理任务上展示了DAM的有效性。