We consider the problem of identifying the signal shared between two one-dimensional target variables, in the presence of additional multivariate observations. Canonical Correlation Analysis (CCA)-based methods have traditionally been used to identify shared variables, however, they were designed for multivariate targets and only offer trivial solutions for univariate cases. In the context of Multi-Task Learning (MTL), various models were postulated to learn features that are sparse and shared across multiple tasks. However, these methods were typically evaluated by their predictive performance. To the best of our knowledge, no prior studies systematically evaluated models in terms of correctly recovering the shared signal. Here, we formalize the setting of univariate shared information retrieval, and propose ICM, an evaluation metric which can be used in the presence of ground-truth labels, quantifying 3 aspects of the learned shared features. We further propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables. We benchmark the models on a range of scenarios on synthetic data with known ground-truths and observe DCID outperforming the baselines in a wide range of settings. Finally, we demonstrate a real-life application of DCID on brain Magnetic Resonance Imaging (MRI) data, where we are able to extract more accurate predictors of changes in brain regions and obesity. The code for our experiments as well as the supplementary materials are available at https://github.com/alexrakowski/dcid
翻译:我们考虑在存在额外多变量观测的情况下,识别两个一维目标变量之间共享信号的问题。基于典型相关分析(CCA)的方法传统上被用于识别共享变量,然而,它们是为多变量目标设计的,并且对于单变量情况仅提供平凡解。在多任务学习(MTL)的背景下,提出了各种模型以学习跨多个任务稀疏且共享的特征。然而,这些方法通常通过其预测性能进行评估。据我们所知,没有先前的研究系统性地评估模型在正确恢复共享信号方面的表现。在此,我们形式化了单变量共享信息检索的设置,并提出了ICM——一种评估指标,可在存在真实标签的情况下使用,量化所学习共享特征的三个方面。我们进一步提出了深度典型信息分解(DCID)——一种简单而有效的学习共享变量的方法。我们在具有已知真实标签的合成数据上的一系列场景中对模型进行基准测试,并观察到DCID在广泛设置中优于基线方法。最后,我们展示了DCID在脑部磁共振成像(MRI)数据上的实际应用,能够提取出更准确的脑区域变化和肥胖预测因子。我们实验的代码以及补充材料可在 https://github.com/alexrakowski/dcid 获取。