Based on technological advances in sensing modalities, randomized trials with primary outcomes represented as high-dimensional vectors have become increasingly prevalent. For example, these outcomes could be week-long time-series data from wearable devices or high-dimensional neuroimaging data, such as from functional magnetic resonance imaging. This paper focuses on randomized treatment studies with such high-dimensional outcomes characterized by sparse treatment effects, where interventions may influence a small number of dimensions, e.g., small temporal windows or specific brain regions. Conventional practices, such as using fixed, low-dimensional summaries of the outcomes, result in significantly reduced power for detecting treatment effects. To address this limitation, we propose a procedure that involves subset selection followed by inference. Specifically, given a potentially large set of outcome summaries, we identify the subset that captures treatment effects, which requires only one call to the Lasso, and subsequently conduct inference on the selected subset. Via theoretical analysis as well as simulations, we demonstrate that our method asymptotically selects the correct subset and increases statistical power.
翻译:基于传感模式的技术进步,以高维向量作为主要结果的随机试验已日益普遍。例如,这些结果可能来自可穿戴设备的周时间序列数据,或来自功能性磁共振成像等高维神经影像数据。本文聚焦于具有此类高维结果的随机处理研究,其特点是处理效应具有稀疏性,即干预可能仅影响少数维度(例如,短暂的时间窗口或特定脑区)。传统实践(例如使用固定的低维结果摘要)会导致检测处理效应的统计功效显著降低。为克服这一局限,我们提出了一种包含子集选择与后续推断的流程。具体而言,给定一个可能庞大的结果摘要集合,我们识别出能够捕捉处理效应的子集——该步骤仅需调用一次Lasso,随后对所选子集进行统计推断。通过理论分析与模拟实验,我们证明该方法能渐近地选择正确子集并提升统计功效。