Case-control sampling is a commonly used retrospective sampling design to alleviate imbalanced structure of binary data. When fitting the logistic regression model with case-control data, although the slope parameter of the model can be consistently estimated, the intercept parameter is not identifiable, and the marginal case proportion is not estimatable, either. We consider the situations in which besides the case-control data from the main study, called internal study, there also exists summary-level information from related external studies. An empirical likelihood based approach is proposed to make inference for the logistic model by incorporating the internal case-control data and external information. We show that the intercept parameter is identifiable with the help of external information, and then all the regression parameters as well as the marginal case proportion can be estimated consistently. The proposed method also accounts for the possible variability in external studies. The resultant estimators are shown to be asymptotically normally distributed. The asymptotic variance-covariance matrix can be consistently estimated by the case-control data. The optimal way to utilized external information is discussed. Simulation studies are conducted to verify the theoretical findings. A real data set is analyzed for illustration.
翻译:病例对照抽样是一种常用的回顾性抽样设计,用于缓解二分类数据的不平衡结构。在使用病例对照数据拟合逻辑回归模型时,虽然模型的斜率参数可以一致地估计,但截距参数不可识别,边际病例比例也无法估计。我们考虑这样一种情况:除了来自主要研究(称为内部研究)的病例对照数据外,还存在来自相关外部研究的汇总水平信息。本文提出一种基于经验似然的方法,通过整合内部病例对照数据和外部信息来对逻辑模型进行推断。我们证明,在外部信息的帮助下,截距参数是可识别的,进而所有回归参数以及边际病例比例都可以被一致地估计。所提出的方法还考虑了外部研究中可能存在的变异性。所得估计量被证明是渐近正态分布的。渐近方差-协方差矩阵可以通过病例对照数据一致地估计。本文讨论了利用外部信息的最优方式。通过模拟研究验证了理论结果,并分析了一个真实数据集以作说明。