Ever since the seminal work of R. A. Fisher and F. Yates, factorial designs have been an important experimental tool to simultaneously estimate the treatment effects of multiple factors. In factorial designs, the number of treatment levels may grow exponentially with the number of factors, which motivates the forward screening strategy based on the sparsity, hierarchy, and heredity principles for factorial effects. Although this strategy is intuitive and has been widely used in practice, its rigorous statistical theory has not been formally established. To fill this gap, we establish design-based theory for forward factor screening in factorial designs based on the potential outcome framework. We not only prove its consistency property but also discuss statistical inference after factor screening. In particular, with perfect screening, we quantify the advantages of forward screening based on asymptotic efficiency gain in estimating factorial effects. With imperfect screening in higher-order interactions, we propose two novel strategies and investigate their impact on subsequent inference. Our formulation differs from the existing literature on variable selection and post-selection inference because our theory is based solely on the physical randomization of the factorial design and does not rely on a correctly-specified outcome model.
翻译:自R. A. Fisher和F. Yates的开创性工作以来,因子设计已成为同时估计多个因子处理效应的重要实验工具。在因子设计中,处理水平的数量可能随因子数量呈指数增长,这促使研究者基于因子效应的稀疏性、层次性和遗传性原则采用前向筛选策略。尽管该策略直观且已在实践中广泛应用,但其严格的统计理论尚未正式建立。为填补这一空白,我们基于潜在结果框架建立了因子设计中前向因子筛选的设计基础理论。我们不仅证明了其一致性性质,还讨论了因子筛选后的统计推断问题。具体而言,在完美筛选情形下,我们通过估计因子效应的渐近效率增益量化了前向筛选的优势;在涉及高阶交互作用的不完美筛选情形下,我们提出了两种新策略并探讨了它们对后续推断的影响。我们的方法与现有关于变量筛选及筛选后推断的文献存在本质区别,因为我们的理论完全基于因子设计的物理随机化机制,而不依赖于正确指定的结果模型。