Mixture distributions are a workhorse model for multimodal data in information theory, signal processing, and machine learning. Yet even when each component density is simple, the differential entropy of the mixture is notoriously hard to compute because the mixture couples a logarithm with a sum. This paper develops a deterministic, closed-form toolkit for bounding and accurately approximating mixture entropy directly from component parameters. Our starting point is an information-theoretic channel viewpoint: the latent mixture label plays the role of an input, and the observation is the output. This viewpoint separates mixture entropy into an average within-component uncertainty plus an overlap term that quantifies how much the observation reveals about the hidden label. We then bound and approximate this overlap term using pairwise overlap integrals between component densities, yielding explicit expressions whenever these overlaps admit a closed form. A simple, family-dependent offset corrects the systematic bias of the Jensen overlap bound and is calibrated to be exact in the two limiting regimes of complete overlap and near-perfect separation. A final clipping step guarantees that the estimate always respects universal information-theoretic bounds. Closed-form specializations are provided for Gaussian, factorized Laplacian, uniform, and hybrid mixtures, and numerical experiments validate the resulting bounds and approximations across separation, dimension, number of components, and correlated covariances.
翻译:混合分布是信息论、信号处理与机器学习中处理多模态数据的核心模型。然而,即使每个分量密度函数形式简单,混合分布的微分熵因其对数项与求和项耦合而难以精确计算。本文发展了一套确定性的闭式工具集,用于直接从分量参数出发对混合熵进行定界与精确逼近。我们的出发点是一个信息论的信道视角:隐式的混合标签扮演输入的角色,观测值则为输出。这一视角将混合熵分解为分量内不确定性的平均值,加上一个量化观测值对隐藏标签揭示程度的“重叠项”。随后,我们利用分量密度函数之间的两两重叠积分对该重叠项进行定界与逼近,从而在这些重叠积分存在闭式表达时得到显式结果。一个简单、依赖于分布族的偏移项修正了詹森重叠界的系统性偏差,并通过校准使其在完全重叠与近乎完全分离两种极限情形下达到精确。最终,通过一个截断步骤确保估计值始终满足普适的信息论界限。本文针对高斯混合、因子化拉普拉斯混合、均匀混合及混合类型分布给出了闭式特化结果,并通过数值实验验证了所得界限与逼近方法在分离程度、维度、分量数量及相关协方差等多种场景下的有效性。