We explore the connection between deep learning and information theory through the paradigm of diffusion models. A diffusion model converts noise into structured data by reinstating, imperfectly, information that is erased when data was diffused to noise. This information is stored in a neural network during training. We quantify this information by introducing a measure called neural entropy, which is related to the total entropy produced by diffusion. Neural entropy is a function of not just the data distribution, but also the diffusive process itself. Measurements of neural entropy on a few simple image diffusion models reveal that they are extremely efficient at compressing large ensembles of structured data.
翻译:我们通过扩散模型的范式探索深度学习与信息论之间的联系。扩散模型通过不完全地恢复数据扩散为噪声时被擦除的信息,将噪声转化为结构化数据。这一信息在训练过程中存储于神经网络中。我们通过引入一种称为神经熵的度量来量化该信息,该度量与扩散过程产生的总熵相关。神经熵不仅是数据分布的函数,也依赖于扩散过程本身。在几个简单的图像扩散模型上对神经熵的测量表明,这些模型在压缩大规模结构化数据集合方面具有极高的效率。