Sampling from unnormalized probability densities is a central challenge in computational science. Boltzmann generators are generative models that enable independent sampling from the Boltzmann distribution of physical systems at a given temperature. However, their practical success depends on data-efficient training, as both simulation data and target energy evaluations are costly. To this end, we propose off-policy log-dispersion regularization (LDR), a novel regularization framework that builds on a generalization of the log-variance objective. We apply LDR in the off-policy setting in combination with standard data-based training objectives, without requiring additional on-policy samples. LDR acts as a shape regularizer of the energy landscape by leveraging additional information in the form of target energy labels. The proposed regularization framework is broadly applicable, supporting unbiased or biased simulation datasets as well as purely variational training without access to target samples. Across all benchmarks, LDR improves both final performance and data efficiency, with sample efficiency gains of up to one order of magnitude.
翻译:从非归一化概率密度中采样是计算科学的核心挑战之一。玻尔兹曼生成器是一种生成模型,能够从给定温度下物理系统的玻尔兹曼分布中进行独立采样。然而,其实际应用的成功取决于数据高效的训练,因为模拟数据和目标能量评估均成本高昂。为此,我们提出离策略对数离散正则化(LDR),这是一种基于对数方差目标泛化的新型正则化框架。我们将LDR应用于离策略设置中,并结合标准基于数据的训练目标,无需额外的同策略样本。通过利用目标能量标签形式的附加信息,LDR可作为能量景观的形状正则化器。所提出的正则化框架具有广泛适用性,支持无偏或有偏的模拟数据集,以及无需目标样本的纯变分训练。在所有基准测试中,LDR均能提升最终性能和数据效率,样本效率增益最高可达一个数量级。