Data scarcity limits inference in many scientific and policy domains. Survey data are essential for decision-making, but sparse samples often fail to capture fine spatial granularities. We evaluate normalizing flows, a generative model that learns complex data distributions and can be conditioned on exogenous contextual features, in controlled data scarcity scenarios. Across eight household survey datasets spanning six low-income or middle-income countries in the humanitarian domain, we show that context-conditioned generative models can refine sub-national survey distributions under severe data scarcity, and that performance increases systematically with the richness of the conditioning information. These findings support a general principle for survey data augmentation: generative models can improve sub-national estimates when the sparse sample retains sufficient support and contextual covariates encode relevant local heterogeneity. By learning full conditional distributions rather than point estimates, the approach provides fine-grained evidence for humanitarian decision-making and resource allocation.
翻译:数据稀缺限制了科学及政策领域的许多推断能力。调查数据对决策至关重要,但稀疏样本往往无法捕捉精细的空间粒度。我们评估了归一化流——一种能学习复杂数据分布并可基于外生上下文特征进行条件生成的生成模型——在受控数据稀缺场景下的表现。针对人道主义领域覆盖六个中低收入国家的八项家庭调查数据集,我们表明,在数据严重稀缺的情况下,条件生成模型能够优化子国家层面的调查分布,且性能随着条件信息的丰富程度系统性提升。这些发现支持一项调查数据扩充的通用原则:当稀疏样本保留充分支持且上下文协变量编码相关局部异质性时,生成模型可改进子国家层面的估计。通过学习完整条件分布而非点估计,该方法为人道主义决策及资源分配提供了细粒度的证据。