The zero-inflated logistic regression model accommodates binary responses with excess zeros, which often arise from a latent mixture of susceptible and insusceptible subpopulations or asymmetric misclassification of the response. The model has two components: regression for the binary response and a latent binary indicator for the zero-inflation state. In applied settings, it is common to use the same design matrix for both components if there is no prior knowledge. However, this shared-design specification lacks guaranteed identifiability of the regression parameters, as established in prior works. This paper investigates the theoretical properties of the zero-inflated logistic regression model under the shared-design setting and computational methods for applications. First, to motivate the use of the zero-inflated model, we prove that ignoring the zero-inflation mechanism can lead to a sign flip in the pseudo-true coefficient value relative to the true value. We then establish sufficient conditions for the existence of the maximum likelihood estimate. As a main result, we establish that the model under the shared-design setting is identifiable up to exchange symmetry of the parameters for two components and that the expected log-likelihood has a unique maximizer on the resulting quotient space. The posterior bimodality is examined using a Pólya-Gamma Gibbs sampler with replica exchange. Finally, we propose a simple relabeling rule to select a single ordered parameter pair, and evaluate its performance through simulation studies and an application to self-reported diabetes data.
翻译:零膨胀逻辑回归模型可处理存在多余零点的二元响应数据,这些零点常源于易感与不易感亚群体的潜在混合或响应的非对称误分类。该模型包含两个组成部分:对二元响应的回归,以及对零膨胀状态的潜在二元指示变量。在实际应用中,若缺乏先验知识,通常对两个分量采用相同的设计矩阵。然而,既往研究已证实这种共享设计规范无法保证回归参数的可辨识性。本文考察了共享设计设定下零膨胀逻辑回归模型的理论性质及实用计算方法。首先,为激发零膨胀模型的应用价值,我们证明忽略零膨胀机制可能导致伪真实系数值相对于真实值发生符号翻转。继而建立最大似然估计存在性的充分条件。作为主要结论,我们证明在共享设计设定下,该模型在参数置换对称性意义下可辨识,且期望对数似然函数在所得的商空间上具有唯一最大值点。通过采用副本交换的Pólya-Gamma吉布斯采样器,我们检验了后验双峰特性。最后,提出一种简单的重标记规则以选取单一有序参数对,并通过仿真实验及自报糖尿病数据应用案例评估其性能。