We study the problem of selecting a subset of patients who are unlikely to experience an adverse event within a fixed time horizon by calibrating a screening rule based on a black-box survival model. We consider two complementary, distribution-free frameworks for this task. The first extends classical calibration ideas -- estimating the event rate among selected patients using a hold-out dataset -- by integrating them with the Learn-Then-Test (LTT) framework, yielding high-probability guarantees for data-adaptively tuned screening rules. The second takes a different perspective by reformulating screening as a hypothesis testing problem on future patient outcomes, enabling false discovery rate (FDR) control via the Benjamini-Hochberg procedure applied to selective conformal p-values, and providing guarantees in expectation. We clarify the theoretical relationship between these approaches, explain how both can be adapted to right-censored time-to-event data via inverse probability of censoring weighting, and compare them empirically using simulations and oncology data from the Flatiron Health Research Database. Our results reveal a trade-off between efficiency and strength of guarantees: FDR-based screening is typically more powerful, while LTT-based calibration is more conservative but offers stronger guarantees. We also provide practical guidance on implementation and tuning.
翻译:本研究探讨了通过基于黑箱生存模型校准筛选规则,以选择在固定时间窗内不太可能发生不良事件的患者子集的问题。我们为此任务提出了两种互补的无分布框架。第一种框架通过将经典校准思想(即利用留出数据集估计选定患者中的事件发生率)与“学习-测试”(LTT)框架相结合,扩展了传统方法,从而为数据自适应调优的筛选规则提供了高概率保证。第二种框架则从不同视角出发,将筛选问题重新表述为对未来患者结局的假设检验问题,通过将Benjamini-Hochberg程序应用于选择性共形p值来实现错误发现率(FDR)控制,并提供期望意义上的统计保证。我们阐明了这两种方法在理论上的关联,解释了如何通过逆删失概率加权使两者均能适应右删失时间-事件数据,并利用模拟实验和来自Flatiron Health研究数据库的肿瘤学数据进行了实证比较。研究结果揭示了筛选效率与保证强度之间的权衡:基于FDR的筛选通常具有更高检验效能,而基于LTT的校准方法虽更为保守,却能提供更强的统计保证。我们还为实际应用中的实施与调优提供了操作指南。