Multi-modal Differentiable Unsupervised Feature Selection

Multi-modal high throughput biological data presents a great scientific opportunity and a significant computational challenge. In multi-modal measurements, every sample is observed simultaneously by two or more sets of sensors. In such settings, many observed variables in both modalities are often nuisance and do not carry information about the phenomenon of interest. Here, we propose a multi-modal unsupervised feature selection framework: identifying informative variables based on coupled high-dimensional measurements. Our method is designed to identify features associated with two types of latent low-dimensional structures: (i) shared structures that govern the observations in both modalities and (ii) differential structures that appear in only one modality. To that end, we propose two Laplacian-based scoring operators. We incorporate the scores with differentiable gates that mask nuisance features and enhance the accuracy of the structure captured by the graph Laplacian. The performance of the new scheme is illustrated using synthetic and real datasets, including an extended biological application to single-cell multi-omics.

翻译：多模态高通量生物数据既带来了巨大的科学机遇，也带来了显著的计算挑战。在多模态测量中，每个样本由两组或多组传感器同时观测。在此类设定下，两种模态中的许多观测变量往往是干扰因素，不携带关于目标现象的信息。本文提出一种多模态无监督特征选择框架：基于耦合的高维测量数据识别信息性变量。我们的方法旨在识别与两类潜在低维结构相关的特征：（i）控制两种模态观测的共享结构，以及（ii）仅出现在单一模态中的差异结构。为此，我们提出两种基于拉普拉斯算子的评分算子。我们将评分与可微门控机制相结合，用于遮蔽干扰特征并提升图拉普拉斯所捕获结构的准确性。通过合成数据集和真实数据集（包括单细胞多组学这一扩展生物学应用）验证了新方案的性能。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日