For a set of binary response variables, conditional mean models characterize the expected value of a response variable given the others and are popularly applied in longitudinal and network data analyses. The quadratic exponential binary distribution is a natural choice in this context. However, maximum likelihood estimation of this distribution is computationally demanding due to its intractable normalizing constant, while the pseudo-likelihood, though computationally convenient, tends to severely underestimate the standard errors. In this work, we investigate valid estimation methods for the quadratic exponential binary distribution and its regression counterpart. We show that, when applying the generalized estimating equations to the pseudo-likelihood, using the independence working correlation yields consistent estimates, whereas using dependent structures, such as compound symmetric or autoregressive correlations, may introduce non-ignorable biases. Theoretical properties are derived, supported by simulation studies. For illustration, we apply the proposed approach to the carcinogenic toxicity of chemicals data and the constitutional court opinion wringing data.
翻译:对于一组二元响应变量,条件均值模型刻画了给定其他变量时某一响应变量的期望值,在纵向数据和网络数据分析中得到了广泛应用。二次指数二元分布是该情境下的自然选择。然而,由于难以处理的归一化常数,该分布的最大似然估计计算量巨大;而伪似然方法虽计算便捷,却往往严重低估标准误。本研究探讨了二次指数二元分布及其回归模型的可靠估计方法。我们证明,将广义估计方程应用于伪似然时,采用独立工作相关矩阵可获得一致性估计,而采用复合对称或自回归等依赖结构则可能引入不可忽略的偏差。本文推导了相关理论性质,并通过模拟研究加以验证。为说明方法的应用,我们将所提方法应用于化学品致癌毒性数据和宪法法院意见书数据。