In the FDR-controlling literature, mirror statistics offer a flexible alternative to $p$-value based procedures. When prior information is available, however, it is unclear how to incorporate mirror statistics in a principled way, and the standard equal split used by data-splitting methods can be inefficient. In this paper, we characterize a broader class of mirror statistics for any fixed splitting scheme and establish asymptotic FDR control under mild weak-dependence conditions using a two-stage procedure inspired by \cite{li2021whiteout}. Within this class, we derive a Bayes-optimal mirror statistic. Theoretically, we demonstrate its power advantage through analyses in the Rare/Weak signal model. Building upon this Bayes-optimal mirror statistic, we propose \textsc{PRADAS} (PRior-Assisted DAta Splitting) that treats split ratio as a stopping time and recasts the data-splitting as an optional stopping over a natural filtration; the optimal stopping rule is characterized by the Snell envelope and computed efficiently via a Longstaff--Schwartz regression approximation. Both simulations and real data examples demonstrate the effectiveness of our proposed framework.
翻译:摘要:在错误发现率(FDR)控制文献中,镜像统计量提供了基于$p$值方法的一种灵活替代方案。然而,当存在先验信息时,如何以原则性方式整合镜像统计量尚不明确,且数据分割方法中使用的标准等分策略可能效率低下。本文针对任意固定分割方案刻画了一类更广泛的镜像统计量,并借鉴\cite{li2021whiteout}的两阶段方法,在温和弱相依条件下建立了渐近FDR控制。在该类镜像统计量中,我们推导出贝叶斯最优镜像统计量。理论上,通过稀有/弱信号模型的分析,我们证明了其功率优势。基于此贝叶斯最优镜像统计量,我们提出\textsc{PRADAS}(基于先验信息辅助的数据分割),该方法将分割比例视为停时,并将数据分割重新表述为自然滤子上的可选停时;最优停时规则通过Snell包络刻画,并通过Longstaff-Schwartz回归近似高效计算。仿真实验和真实数据案例均验证了我们所提出框架的有效性。