Estimating the expectation of a real-valued function of a random variable from sample data is a critical aspect of statistical analysis, with far-reaching implications in various applications. Current methodologies typically assume (semi-)parametric distributions such as Gaussian or mixed Gaussian, leading to significant estimation uncertainty if these assumptions do not hold. We propose a flow-based model, integrated with stratified sampling, that leverages a parametrized neural network to offer greater flexibility in modeling unknown data distributions, thereby mitigating this limitation. Our model shows a marked reduction in estimation uncertainty across multiple datasets, including high-dimensional (30 and 128) ones, outperforming crude Monte Carlo estimators and Gaussian mixture models. Reproducible code is available at https://github.com/rnoxy/flowstrat.
翻译:从样本数据中估计随机变量实值函数的期望是统计分析的关键环节,在各类应用中具有深远影响。现有方法通常假设(半)参数化分布(如高斯分布或混合高斯分布),若这些假设不成立则会导致显著的估计不确定性。我们提出一种结合分层抽样的流式模型,该模型利用参数化神经网络为未知数据分布建模提供更强的灵活性,从而缓解这一局限。我们的模型在多个数据集(包括高维(30维与128维)数据集)上均显示出估计不确定性的显著降低,其性能优于原始蒙特卡洛估计器与高斯混合模型。可复现代码详见 https://github.com/rnoxy/flowstrat。