Rethinking Individual Risk and Aggregation in Survival Analysis: A Latent Mechanism Framework

Survival analysis provides a well-established framework for modeling time-to-event data, with hazard and survival functions formally defined as population-level quantities. In applied work, however, these quantities are often interpreted as representing individual-level risk, despite the absence of a clear generative account linking individual risk mechanisms to observed survival data. This paper develops a latent hazard framework that makes this relationship explicit by modeling event times as arising from unobserved, individual-specific hazard mechanisms and viewing population-level survival quantities as aggregates over heterogeneous mechanisms. Within this framework, we show that individual hazard trajectories are not identifiable from survival data under partial information. More generally, the conditional distribution of latent hazard mechanisms given covariates is structurally non-identifiable, even when population-level survival functions are fully known. This non-identifiability arises from the aggregation inherent in survival data and persists independently of model flexibility or estimation strategy. Finally, we show that classical survival models can be systematically reinterpreted according to how they handle this unresolved conditional mechanism distribution. This paper provides a unified framework for understanding heterogeneity, identifiability, and interpretation in survival analysis, and clarifies how population-level survival models should be interpreted when individual risk mechanisms are only partially observed, thereby establishing explicit information constraints for principled modeling and inference.

翻译：生存分析为时间-事件数据建模提供了成熟的框架，其风险函数和生存函数被正式定义为群体层面的量值。然而，在应用研究中，这些量值常被解释为代表个体层面的风险，尽管缺乏明确的生成性框架将个体风险机制与观测到的生存数据联系起来。本文提出了一种潜在风险框架，通过将事件时间建模为源自未观测的、个体特定的风险机制，并将群体层面的生存量值视为异质性机制的聚合，使这种联系得以明确。在该框架内，我们证明了在部分信息条件下，个体风险轨迹无法从生存数据中识别。更一般地，给定协变量的潜在风险机制的条件分布在结构上不可识别，即使群体层面的生存函数完全已知。这种不可识别性源于生存数据中固有的聚合特性，且独立于模型灵活性或估计策略。最后，我们展示了经典生存模型可依据其如何处理这一未解决的条件机制分布而被系统性地重新解释。本文为理解生存分析中的异质性、可识别性和解释提供了统一框架，并阐明了当个体风险机制仅部分可观测时，应如何解释群体层面的生存模型，从而为原则性建模和推断建立了明确的信息约束。