Conditional density estimation (CDE) - recovering the full conditional distribution of a response given tabular covariates - is essential in settings with heteroscedasticity, multimodality, or asymmetric uncertainty. Recent tabular foundation models, such as TabPFN and TabICL, naturally produce predictive distributions, but their effectiveness as general-purpose CDE methods has not been systematically evaluated, unlike their performance for point prediction, which is well studied. We benchmark three tabular foundation model variants against a diverse set of parametric, tree-based, and neural CDE baselines on 39 real-world datasets, across training sizes from 50 to 20,000, using six metrics covering density accuracy, calibration, and computation time. Across all sample sizes, foundation models achieve the best CDE loss, log-likelihood, and CRPS on the large majority of datasets tested. Calibration is competitive at small sample sizes but, for some metrics and datasets, lags behind task-specific neural baselines at larger sample sizes, suggesting that post-hoc recalibration may be a valuable complement. In a photometric redshift case study using SDSS DR18, TabPFN exposed to 50,000 training galaxies outperforms all baselines trained on the full 500,000-galaxy dataset. Taken together, these results establish tabular foundation models as strong off-the-shelf conditional density estimators.
翻译:条件密度估计(CDE)——基于表格协变量恢复响应的完整条件分布——在存在异方差性、多模态性或非对称不确定性的场景中至关重要。最近的表格基础模型(如TabPFN和TabICL)能够自然生成预测分布,但作为通用CDE方法的有效性尚未被系统评估,这与它们在点预测方面的成熟研究形成对比。我们针对三种表格基础模型变体,在39个真实世界数据集上(训练样本量从50到20,000不等)与多样化参数、树基和神经网络CDE基线进行基准测试,采用涵盖密度准确性、校准性和计算时间的六项指标。在所有样本量下,基础模型在绝大多数测试数据集上取得了最优的CDE损失、对数似然和CRPS。小样本量下校准性表现优异,但在较大样本量下,针对某些指标和数据集,其校准性落后于任务特定的神经网络基线,这表明事后校准可能是有价值的补充。在基于SDSS DR18的光度红移案例研究中,仅使用50,000个训练星系的TabPFN表现优于所有基于完整50万星系数据集训练的基线。综合来看,这些结果确立了表格基础模型作为即用型条件密度估计器的强大地位。