Monte Carlo methods, Variational Inference, and their combinations play a pivotal role in sampling from intractable probability distributions. However, current studies lack a unified evaluation framework, relying on disparate performance measures and limited method comparisons across diverse tasks, complicating the assessment of progress and hindering the decision-making of practitioners. In response to these challenges, our work introduces a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria. Moreover, we study existing metrics for quantifying mode collapse and introduce novel metrics for this purpose. Our findings provide insights into strengths and weaknesses of existing sampling methods, serving as a valuable reference for future developments. The code is publicly available here.
翻译:蒙特卡洛方法、变分推断及其组合在从难解概率分布中采样方面发挥着关键作用。然而,现有研究缺乏统一的评估框架,依赖分散的性能度量指标和有限的方法跨任务比较,这使得进展评估复杂化并阻碍了实践者的决策。针对这些挑战,本研究引入了一个基准测试,通过标准化任务套件和广泛的性能标准来评估采样方法。此外,我们研究了现有量化模式崩溃的度量指标,并为此提出了新的度量方法。我们的研究结果揭示了现有采样方法的优势与局限,为未来发展提供了重要参考。代码已在此公开。