Sparse data is fundamental to scientific simulations in biology and physics, from single-cell gene expression to particle calorimetry, where exact zeros encode physical absence rather than weak signal. However, existing diffusion models lack the physical rigor to faithfully represent this sparsity. This work introduces Sparse Data Diffusion (SDD), a generative method that explicitly models exact zeros via Sparsity Bits, unifying efficient ML generation with physically grounded sparsity handling. Empirical validation in particle physics and single-cell biology demonstrates that SDD achieves higher fidelity than baseline methods in capturing sparse patterns critical for scientific analysis, advancing scalable and physically faithful simulation.
翻译:稀疏数据是生物学与物理学科学模拟的基础,从单细胞基因表达到粒子量热法,其中精确的零值编码的是物理上的缺失而非微弱信号。然而,现有的扩散模型缺乏严谨的物理基础来忠实地表征这种稀疏性。本研究提出了稀疏数据扩散方法,这是一种通过稀疏位显式建模精确零值的生成方法,将高效的机器学习生成与基于物理的稀疏性处理统一起来。在粒子物理学和单细胞生物学中的实证验证表明,在捕捉对科学分析至关重要的稀疏模式方面,SDD比基线方法实现了更高的保真度,从而推动了可扩展且物理上可信的模拟。