Ever since the seminal work of R. A. Fisher and F. Yates, factorial designs have been an important experimental tool to simultaneously estimate the effects of multiple treatment factors. In factorial designs, the number of treatment combinations grows exponentially with the number of treatment factors, which motivates the forward selection strategy based on the sparsity, hierarchy, and heredity principles for factorial effects. Although this strategy is intuitive and has been widely used in practice, its rigorous statistical theory has not been formally established. To fill this gap, we establish design-based theory for forward factor selection in factorial designs based on the potential outcome framework. We not only prove a consistency property for the factor selection procedure but also discuss statistical inference after factor selection. In particular, with selection consistency, we quantify the advantages of forward selection based on asymptotic efficiency gain in estimating factorial effects. With inconsistent selection in higher-order interactions, we propose two strategies and investigate their impact on subsequent inference. Our formulation differs from the existing literature on variable selection and post-selection inference because our theory is based solely on the physical randomization of the factorial design and does not rely on a correctly specified outcome model.
翻译:自R. A. Fisher与F. Yates的开创性工作以来,析因设计一直是同时估计多个处理因素效应的重要实验工具。在析因设计中,处理组合数量随因素数量呈指数增长,这促使研究者基于析因效应的稀疏性、层次性和可遗传性原则采用向前选择策略。尽管该策略直观且在实践中广泛应用,但其严格的统计学理论尚未正式建立。为填补这一空白,我们基于潜在结果框架建立了析因设计中向前因素选择的设计驱动理论。我们不仅证明了因素选择程序的一致性性质,还探讨了因素选择后的统计推断问题。具体而言,在具有选择一致性的条件下,我们通过估计析因效应的渐近效率增益量化了向前选择的优势;当高阶交互作用存在不一致选择时,我们提出两种策略并研究其对后续推断的影响。本研究的理论框架不同于现有关于变量选择与选择后推断的文献,其理论完全基于析因设计的物理随机化,无需依赖正确设定的结果模型。