We investigate the statistical and computational limits of latent \textbf{Di}ffusion \textbf{T}ransformers (\textbf{DiT}s) under the low-dimensional linear latent space assumption. Statistically, we study the universal approximation and sample complexity of the DiTs score function, as well as the distribution recovery property of the initial data. Specifically, under mild data assumptions, we derive an approximation error bound for the score network of latent DiTs, which is sub-linear in the latent space dimension. Additionally, we derive the corresponding sample complexity bound and show that the data distribution generated from the estimated score function converges toward a proximate area of the original one. Computationally, we characterize the hardness of both forward inference and backward computation of latent DiTs, assuming the Strong Exponential Time Hypothesis (SETH). For forward inference, we identify efficient criteria for all possible latent DiTs inference algorithms and showcase our theory by pushing the efficiency toward almost-linear time inference. For backward computation, we leverage the low-rank structure within the gradient computation of DiTs training for possible algorithmic speedup. Specifically, we show that such speedup achieves almost-linear time latent DiTs training by casting the DiTs gradient as a series of chained low-rank approximations with bounded error. Under the low-dimensional assumption, we show that the convergence rate and the computational efficiency are both dominated by the dimension of the subspace, suggesting that latent DiTs have the potential to bypass the challenges associated with the high dimensionality of initial data.
翻译:本研究在线性低维潜在空间假设下,探讨了潜在扩散变换器(DiTs)的统计与计算极限。在统计层面,我们分析了DiTs分数函数的通用逼近能力与样本复杂度,以及初始数据的分布恢复特性。具体而言,在温和的数据假设下,我们推导了潜在DiTs分数网络的逼近误差界,该误差界在潜在空间维度上呈次线性关系。此外,我们推导了相应的样本复杂度界,并证明从估计的分数函数生成的数据分布会收敛至原始分布的一个邻近区域。在计算层面,我们基于强指数时间假设(SETH),刻画了潜在DiTs前向推断与反向计算的困难度。对于前向推断,我们为所有可能的潜在DiTs推断算法建立了高效性准则,并通过将效率推向近乎线性时间的推断来展示我们的理论。对于反向计算,我们利用DiTs训练中梯度计算的低秩结构以实现可能的算法加速。具体而言,我们通过将DiTs梯度转化为一系列误差有界的链式低秩近似,证明了这种加速能够实现近乎线性时间的潜在DiTs训练。在低维假设下,我们证明收敛速率与计算效率均受子空间维度的主导,这表明潜在DiTs有望规避初始数据高维性所带来的挑战。