A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot

Generative modeling in machine learning aims to synthesize new data samples that are statistically similar to those observed during training. While conventional generative models such as GANs and diffusion models typically assume access to large and diverse datasets, many real-world applications (e.g. in medicine, satellite imaging, and artistic domains) operate under limited data availability and strict constraints. In this survey, we examine Generative Modeling under Data Constraint (GM-DC), which includes limited-data, few-shot, and zero-shot settings. We present a unified perspective on the key challenges in GM-DC, including overfitting, frequency bias, and incompatible knowledge transfer, and discuss how these issues impact model performance. To systematically analyze this growing field, we introduce two novel taxonomies: one categorizing GM-DC tasks (e.g. unconditional vs. conditional generation, cross-domain adaptation, and subject-driven modeling), and another organizing methodological approaches (e.g. transfer learning, data augmentation, meta-learning, and frequency-aware modeling). Our study reviews over 230 papers, offering a comprehensive view across generative model types and constraint scenarios. We further analyze task-approach-method interactions using a Sankey diagram and highlight promising directions for future work, including adaptation of foundation models, holistic evaluation frameworks, and data-centric strategies for sample selection. This survey provides a timely and practical roadmap for researchers and practitioners aiming to advance generative modeling under limited data. Project website: https://sutd-visual-computing-group.github.io/gmdc-survey/.

翻译：机器学习中的生成建模旨在合成与训练期间观测到的数据样本在统计上相似的新数据样本。尽管传统的生成模型（如GAN和扩散模型）通常假设能够访问大规模多样化数据集，但许多现实应用（例如医学、卫星成像和艺术领域）在数据可用性有限且约束严格的条件下运行。本综述考察数据约束下的生成建模，涵盖有限数据、少样本和零样本场景。我们提出了关于GM-DC关键挑战的统一视角，包括过拟合、频率偏差和知识迁移失配，并探讨了这些问题如何影响模型性能。为系统分析这一新兴领域，我们引入了两种新颖的分类体系：一种对GM-DC任务进行分类（例如无条件生成与条件生成、跨域适应和主体驱动建模），另一种对方法论进行组织（例如迁移学习、数据增强、元学习和频率感知建模）。本研究回顾了230余篇论文，为不同生成模型类型和约束场景提供了全面视角。我们进一步通过桑基图分析了任务-方法-策略的交互关系，并指出未来研究的潜在方向，包括基础模型适配、整体评估框架以及面向样本选择的数据中心策略。本综述为致力于推进有限数据条件下生成建模的研究者与实践者提供了及时且实用的路线图。项目网站：https://sutd-visual-computing-group.github.io/gmdc-survey/。