This paper studies the information gap between mixture detection and label recovery in binomial logistic mixtures. Standard likelihood-based criteria such as the Bayesian information criterion (BIC) can detect the presence of two components, but this does not guarantee that the corresponding labels are recoverable. We show that this gap is intrinsic to binomial logistic mixtures with a fixed number of trials: observed-data evidence for mixture structure and per-observation information for label recovery have different local orders in the component separation, and only the former accumulates with the sample size. As a result, there exists a detectable-but-unrecoverable regime in which BIC selects two components while the posterior labels remain essentially uninformative. To address this issue, we propose two feasibility-aware inference procedures: a recoverability-aware BIC with a posterior-entropy penalty and an entropy-regularized estimator that mitigates the tendency of the maximum likelihood estimator to produce overly separated components and overly concentrated posterior responsibilities. Numerical experiments confirm the predicted gap and demonstrate that the proposed methods avoid misleading component selections and improve the calibration of posterior label probabilities.
翻译:本文研究了二项逻辑斯蒂混合模型中混合检测与标签恢复之间的信息缺口。基于标准似然准则(如贝叶斯信息准则BIC)能够检测到两个成分的存在,但这并不保证相应的标签是可恢复的。我们证明,这种缺口对于具有固定试验次数的二项逻辑斯蒂混合模型是内在的:混合结构的观测数据证据与标签恢复的逐观测信息在成分分离度上具有不同的局部阶数,且只有前者会随样本量累积。因此,存在一个可检测但不可恢复的区域,在此区域中BIC选择两个成分,而后续标签仍然基本无信息。为解决这一问题,我们提出了两种可行性感知推理方法:一种是带有后验熵惩罚的可恢复性感知BIC,另一种是熵正则化估计器,用于缓解最大似然估计器倾向于产生过度分离的成分和过度集中的后验责任的问题。数值实验证实了预测的缺口,并表明所提出的方法能避免误导性的成分选择并改善后验标签概率的校准。