Real-world data generation often involves complex inter-dependencies among instances, violating the IID-data hypothesis of standard learning paradigms and posing a challenge for uncovering the geometric structures for learning desired instance representations. To this end, we introduce an energy constrained diffusion model which encodes a batch of instances from a dataset into evolutionary states that progressively incorporate other instances' information by their interactions. The diffusion process is constrained by descent criteria w.r.t.~a principled energy function that characterizes the global consistency of instance representations over latent structures. We provide rigorous theory that implies closed-form optimal estimates for the pairwise diffusion strength among arbitrary instance pairs, which gives rise to a new class of neural encoders, dubbed as DIFFormer (diffusion-based Transformers), with two instantiations: a simple version with linear complexity for prohibitive instance numbers, and an advanced version for learning complex structures. Experiments highlight the wide applicability of our model as a general-purpose encoder backbone with superior performance in various tasks, such as node classification on large graphs, semi-supervised image/text classification, and spatial-temporal dynamics prediction.
翻译:现实世界的数据生成往往涉及实例间复杂的相互依赖关系,这违反了标准学习范式的独立同分布数据假设,并为挖掘学习期望实例表示所需的几何结构带来了挑战。为此,我们引入了一种能量约束扩散模型,该模型将数据集中的一批实例编码为演化状态,这些状态通过实例间的相互作用逐步融合其他实例的信息。扩散过程受下降准则约束,该准则遵循一个原则性能量函数,该函数刻画了实例表示在潜在结构上的全局一致性。我们提供了严格的理论推导,揭示了任意实例对间成对扩散强度的封闭形式最优估计,由此诞生了一类新的神经编码器,称为DIFFormer(基于扩散的Transformer),其包含两种实例化版本:一种针对海量实例数量具有线性复杂度的简洁版本,另一种用于学习复杂结构的高级版本。实验展示了我们的模型作为通用编码器主干的广泛适用性,在多种任务中均表现出优越性能,例如大规模图上的节点分类、半监督图像/文本分类以及时空动力学预测。