We introduce the Binless Multidimensional Thermodynamic Integration (BMTI) method for nonparametric, robust, and data-efficient density estimation. BMTI estimates the logarithm of the density by initially computing log-density differences between neighbouring data points. Subsequently, such differences are integrated, weighted by their associated uncertainties, using a maximum-likelihood formulation. This procedure can be seen as an extension to a multidimensional setting of the thermodynamic integration, a technique developed in statistical physics. The method leverages the manifold hypothesis, estimating quantities within the intrinsic data manifold without defining an explicit coordinate map. It does not rely on any binning or space partitioning, but rather on the construction of a neighbourhood graph based on an adaptive bandwidth selection procedure. BMTI mitigates the limitations commonly associated with traditional nonparametric density estimators, effectively reconstructing smooth profiles even in high-dimensional embedding spaces. The method is tested on a variety of complex synthetic high-dimensional datasets, where it is shown to outperform traditional estimators, and is benchmarked on realistic datasets from the chemical physics literature.
翻译:本文提出了一种用于非参数、鲁棒且数据高效的密度估计方法——无分箱多维热力学积分法。该方法首先计算相邻数据点之间的对数密度差值,随后通过最大似然框架对这些差值进行加权积分,从而估计密度的对数值。该过程可视为统计物理学中热力学积分技术在多维场景下的扩展。本方法基于流形假设,无需定义显式坐标映射即可在数据本征流形内估计相关量值。它不依赖于任何分箱或空间划分策略,而是通过自适应带宽选择程序构建邻域图。该方法有效缓解了传统非参数密度估计器的常见局限性,即使在高维嵌入空间中也能精确重构平滑的密度分布轮廓。通过在多种复杂合成高维数据集上的测试,本方法展现出优于传统估计器的性能,并在化学物理学文献的实际数据集上完成了基准验证。