We study the optimal sample complexity of neighbourhood selection in linear structural equation models, and compare this to best subset selection (BSS) for linear models under general design. We show by example that -- even when the structure is \emph{unknown} -- the existence of underlying structure can reduce the sample complexity of neighbourhood selection. This result is complicated by the possibility of path cancellation, which we study in detail, and show that improvements are still possible in the presence of path cancellation. Finally, we support these theoretical observations with experiments. The proof introduces a modified BSS estimator, called klBSS, and compares its performance to BSS. The analysis of klBSS may also be of independent interest since it applies to arbitrary structured models, not necessarily those induced by a structural equation model. Our results have implications for structure learning in graphical models, which often relies on neighbourhood selection as a subroutine.
翻译:我们研究线性结构方程模型中邻域选择的最优样本复杂度,并将其与一般设计下线性模型的最优子集选择(BSS)进行比较。通过实例表明——即使结构未知——潜在结构的存在也能降低邻域选择的样本复杂度。这一结果因路径抵消的可能性而变得复杂,我们对此进行了详细研究,并证明在路径抵消存在的情况下仍可实现改进。最后,我们通过实验支持这些理论观察。证明中引入了一种改进的BSS估计量,称为klBSS,并将其性能与BSS进行比较。klBSS的分析可能具有独立意义,因为它适用于任意结构化模型,并非仅局限于结构方程模型所诱导的模型。我们的研究结果对图模型中的结构学习具有启示意义,因为后者通常将邻域选择作为子程序使用。