The case-control sampling design serves as a pivotal strategy in mitigating the imbalanced structure observed in binary data. We consider the estimation of a non-parametric logistic model with the case-control data supplemented by external summary information. The incorporation of external summary information ensures the identifiability of the model. We propose a two-step estimation procedure. In the first step, the external information is utilized to estimate the marginal case proportion. In the second step, the estimated proportion is used to construct a weighted objective function for parameter training. A deep neural network architecture is employed for functional approximation. We further derive the non-asymptotic error bound of the proposed estimator. Following this the convergence rate is obtained and is shown to reach the optimal speed of the non-parametric regression estimation. Simulation studies are conducted to evaluate the theoretical findings of the proposed method. A real data example is analyzed for illustration.
翻译:病例对照抽样设计是缓解二元数据中不平衡结构的关键策略。本研究探讨了利用病例对照数据并结合外部汇总信息来估计非参数逻辑模型的方法。外部汇总信息的引入确保了模型的可识别性。我们提出了一种两步估计流程:第一步利用外部信息估计边际病例比例;第二步将估计的比例用于构建加权目标函数以进行参数训练。采用深度神经网络架构进行函数逼近。我们进一步推导了所提出估计量的非渐近误差界,并获得了收敛速率,证明其能够达到非参数回归估计的最优速度。通过模拟研究评估了所提出方法的理论结果,并分析了一个实际数据案例以作说明。