Estimating the dimensionality of the latent representation needed for prediction -- the task-relevant dimension -- is a difficult, largely unsolved problem with broad scientific applications. We cast it as an Information Bottleneck question: what embedding bottleneck dimension is sufficient to compress predictor and predicted views while preserving their mutual information (MI). This repurposes neural MI estimators for dimensionality estimation. We show that standard neural estimators with separable/bilinear critics systematically inflate the inferred dimension, and we address this by introducing a hybrid critic that retains an explicit dimensional bottleneck while allowing flexible nonlinear cross-view interactions, thereby preserving the latent geometry. We further propose a one-shot protocol that reads off the effective dimension from a single over-parameterized hybrid model, without sweeping over bottleneck sizes. We validate the approach on synthetic problems with known task-relevant dimension. We extend the approach to intrinsic dimensionality by constructing paired views of a single dataset, enabling comparison with classical geometric dimension estimators. In noisy regimes where those estimators degrade, our approach remains reliable. Finally, we demonstrate the utility of the method on multiple physics datasets.
翻译:估计预测所需潜在表示的维度——任务相关维度——是一个困难且基本未解决的问题,在科学领域具有广泛的应用。我们将其表述为一个信息瓶颈问题:何种嵌入瓶颈维度足以压缩预测变量与预测目标视图,同时保持它们的互信息。这重新利用了神经互信息估计器进行维度估计。我们表明,使用可分离/双线性判别器的标准神经估计器会系统性地高估推断维度,我们通过引入一种混合判别器来解决这一问题,该判别器在保持潜在几何结构的同时,既保留了显式的维度瓶颈,又允许灵活的非线性跨视图交互。我们进一步提出了一种单次协议,无需遍历不同瓶颈尺寸,即可从单个过参数化的混合模型中直接读取有效维度。我们在任务相关维度已知的合成问题上验证了该方法。我们通过构建单个数据集的配对视图,将方法扩展到内在维度估计,从而能够与经典的几何维度估计器进行比较。在噪声环境下,经典估计器性能下降时,我们的方法仍保持可靠。最后,我们在多个物理数据集上展示了该方法的实用性。