Many datasets include a small set of variables, such as biomarkers or clinical outcomes, whose relationships to the broader system are of primary scientific interest. Estimating the full network of inter-variable relationships in such settings often obscures local structures around these targets, limiting interpretability. To address this fundamental problem, we introduce local graph estimation, a statistical framework for inferring substructures around target variables. We show that traditional graph estimation methods often fail to recover local structure, and present pathwise feature selection (PFS) as an effective alternative. PFS estimates local subgraphs by iteratively applying feature selection and propagating uncertainty along network paths, providing rigorous finite-sample false discovery control even in settings with mixed variable types and nonlinear dependencies. In four distinct applications spanning environmental and public health, multiomics, brain connectomics, and single-nucleus RNA sequencing, PFS recovers interpretable networks consistent with domain knowledge, highlighting its ability to uncover established mechanisms and generate novel hypotheses.
翻译:许多数据集包含少量变量(如生物标志物或临床结果),这些变量与宏观系统之间的关联是科学研究的核心兴趣点。在此类场景中,估计变量间关系的完整网络往往会掩盖目标变量周围的局部结构,从而限制可解释性。为解决这一根本问题,我们提出局部图估计——一种用于推断目标变量周围子结构的统计框架。研究表明,传统图估计方法通常无法恢复局部结构,而路径特征选择(PFS)作为一种有效替代方案,通过迭代应用特征选择并沿网络路径传播不确定性来估计局部子图,即使在混合变量类型和非线性依赖的场景中,也能提供严格的有限样本错误发现控制。在环境与公共卫生、多组学、脑连接组学以及单细胞核RNA测序等四个不同领域的应用中,PFS能够恢复与领域知识一致的、可解释的网络结构,凸显其揭示已知机制并生成新假设的能力。