While data-driven confounder selection requires careful consideration, it is frequently employed in observational studies. Widely recognized criteria for confounder selection include the minimal-set approach, which involves selecting variables relevant to both treatment and outcome, and the union-set approach, which involves selecting variables associated with either treatment or outcome. These approaches are often implemented using heuristics and off-the-shelf statistical methods, where the degree of uncertainty may not be clear. In this paper, we focus on the false discovery rate (FDR) to measure uncertainty in confounder selection. We define the FDR specific to confounder selection and propose methods based on the mirror statistic, a recently developed approach for FDR control that does not rely on p-values. The proposed methods are p-value-free and require only the assumption of some symmetry in the distribution of the mirror statistic. It can be combined with sparse estimation and other methods that involve difficulties in deriving p-values. The properties of the proposed methods are investigated through exhaustive numerical experiments. Particularly in high-dimensional data scenarios, the proposed methods effectively control FDR and perform better than the p-value-based methods.
翻译:尽管数据驱动的混杂变量选择需要审慎考量,但其在观察性研究中仍被频繁采用。广泛认可的混杂变量选择准则包括最小集方法(选择与处理和结果均相关的变量)与并集方法(选择与处理或结果相关的变量)。这些方法通常通过启发式策略和现成的统计方法实现,其不确定性程度往往不明确。本文聚焦于使用错误发现率(FDR)来衡量混杂变量选择的不确定性。我们定义了混杂变量选择特有的FDR,并提出了基于镜像统计量的控制方法——这是一种新近发展的、不依赖p值的FDR控制方法。所提方法无需p值,仅需假设镜像统计量分布具有某种对称性。该方法可与稀疏估计及其他难以推导p值的方法结合使用。通过大量数值实验验证了所提方法的特性。特别是在高维数据场景下,所提方法能有效控制FDR,且性能优于基于p值的方法。