Selective Inference for Sparse Multitask Regression with Applications in Neuroimaging

Multi-task learning is frequently used to model a set of related response variables from the same set of features, improving predictive performance and modeling accuracy relative to methods that handle each response variable separately. Despite the potential of multi-task learning to yield more powerful inference than single-task alternatives, prior work in this area has largely omitted uncertainty quantification. Our focus in this paper is a common multi-task problem in neuroimaging, where the goal is to understand the relationship between multiple cognitive task scores (or other subject-level assessments) and brain connectome data collected from imaging. We propose a framework for selective inference to address this problem, with the flexibility to: (i) jointly identify the relevant covariates for each task through a sparsity-inducing penalty, and (ii) conduct valid inference in a model based on the estimated sparsity structure. Our framework offers a new conditional procedure for inference, based on a refinement of the selection event that yields a tractable selection-adjusted likelihood. This gives an approximate system of estimating equations for maximum likelihood inference, solvable via a single convex optimization problem, and enables us to efficiently form confidence intervals with approximately the correct coverage. Applied to both simulated data and data from the Adolescent Brain Cognitive Development (ABCD) study, our selective inference methods yield tighter confidence intervals than commonly used alternatives, such as data splitting. We also demonstrate through simulations that multi-task learning with selective inference can more accurately recover true signals than single-task methods.

翻译：多任务学习常用于从相同特征集合中建模一组相关响应变量，相较于分别处理每个响应变量的方法，能提升预测性能与建模精度。尽管多任务学习有望比单任务替代方案产生更强大的推断，但该领域的先前工作在很大程度上忽略了不确定性量化。本文关注神经影像中的一个常见多任务问题，其目标是理解多个认知任务得分（或其他受试者水平评估）与影像采集的脑连接组数据之间的关系。我们提出一个选择性推断框架来解决该问题，该框架具有以下灵活性：（i）通过稀疏诱导惩罚联合识别每个任务的相关协变量；（ii）在基于估计的稀疏结构的模型中进行有效推断。我们的框架提供了一种新的条件推断程序，该程序基于对选择事件的改进，从而得出可处理的选择调整似然函数。这为最大似然推断提供了一个近似估计方程系统，可通过单个凸优化问题求解，并使我们能够高效地构建具有近似正确覆盖率的置信区间。应用于模拟数据以及青少年脑认知发展（ABCD）研究的数据时，我们的选择性推断方法能产生比常见替代方法（如数据分割）更紧致的置信区间。我们还通过模拟证明，具有选择性推断的多任务学习比单任务方法能更准确地恢复真实信号。