Efficient and Debiased Learning of Average Hazard Under Non-Proportional Hazards

The hazard ratio from the Cox proportional hazards model is a ubiquitous summary of treatment effect. However, when hazards are non-proportional, the hazard ratio can lose a stable causal interpretation and become study-dependent because it effectively averages time-varying effects with weights determined by follow-up and censoring. We consider the average hazard (AH) as an alternative causal estimand: a population-level person-time event rate that remains well-defined and interpretable without assuming proportional hazards. Although AH can be estimated nonparametrically and regression-style adjustments have been proposed, existing approaches do not provide a general framework for flexible, high-dimensional nuisance estimation with valid sqrt{n} inference. We address this gap by developing a semiparametric, doubly robust framework for covariate-adjusted AH. We establish pathwise differentiability of AH in the nonparametric model, derive its efficient influence function, and construct cross-fitted, debiased estimators that leverage machine learning for nuisance estimation while retaining asymptotically normal, sqrt{n}-consistent inference under mild product-rate conditions. Simulations demonstrate that the proposed estimator achieves small bias and near-nominal confidence-interval coverage across proportional and non-proportional hazards settings, including crossing-hazards regimes where Cox-based summaries can be unstable. We illustrate practical utility in comparative effectiveness research by comparing immunotherapy regimens for advanced melanoma using SEER-Medicare linked data.

翻译：Cox比例风险模型中的风险比是治疗效应的普遍性总结指标。然而，当风险呈现非比例性时，风险比可能丧失稳定的因果解释力并变得依赖具体研究，因为它本质上是将时变效应按随访时间和删失情况决定的权重进行加权平均。我们提出将平均风险作为替代的因果估计量：这是一种在人群层面定义的人时事件发生率，无需假设比例风险即具备明确的定义和可解释性。尽管平均风险可通过非参数方法进行估计，且已有回归式调整方法被提出，但现有方法未能提供一个支持灵活高维干扰项估计并保持√n有效推断的通用框架。我们通过建立协变量调整平均风险的半参数双稳健框架来填补这一空白。我们在非参数模型中证明了平均风险的道路可微性，推导出其有效影响函数，并构建了交叉拟合的无偏估计量。该估计量在利用机器学习进行干扰项估计的同时，仍能在温和乘积率条件下保持渐近正态性和√n一致性推断。仿真实验表明，所提出的估计量在比例风险与非比例风险（包括基于Cox的总结指标可能不稳定的风险交叉情形）等多种设定下均能实现较小的偏差和接近名义水平的置信区间覆盖。我们通过使用SEER-Medicare关联数据比较晚期黑色素瘤免疫疗法方案，展示了该方法在比较效果研究中的实际应用价值。