MAFS: Multi-head Attention Feature Selection for High-Dimensional Data via Deep Fusion of Filter Methods

Feature selection is essential for high-dimensional biomedical data, enabling stronger predictive performance, reduced computational cost, and improved interpretability in precision medicine applications. Existing approaches face notable challenges. Filter methods are highly scalable but cannot capture complex relationships or eliminate redundancy. Deep learning-based approaches can model nonlinear patterns but often lack stability, interpretability, and efficiency at scale. Single-head attention improves interpretability but is limited in capturing multi-level dependencies and remains sensitive to initialization, reducing reproducibility. Most existing methods rarely combine statistical interpretability with the representational power of deep learning, particularly in ultra-high-dimensional settings. Here, we introduce MAFS (Multi-head Attention-based Feature Selection), a hybrid framework that integrates statistical priors with deep learning capabilities. MAFS begins with filter-based priors for stable initialization and guide learning. It then uses multi-head attention to examine features from multiple perspectives in parallel, capturing complex nonlinear relationships and interactions. Finally, a reordering module consolidates outputs across attention heads, resolving conflicts and minimizing information loss to generate robust and consistent feature rankings. This design combines statistical guidance with deep modeling capacity, yielding interpretable importance scores while maximizing retention of informative signals. Across simulated and real-world datasets, including cancer gene expression and Alzheimer's disease data, MAFS consistently achieves superior coverage and stability compared with existing filter-based and deep learning-based alternatives, offering a scalable, interpretable, and robust solution for feature selection in high-dimensional biomedical data.

翻译：特征选择对于高维生物医学数据至关重要，它能够增强预测性能、降低计算成本，并提升精准医疗应用中的可解释性。现有方法面临显著挑战：滤波器方法具有高度可扩展性，但无法捕捉复杂关系或消除冗余；基于深度学习的方法能够建模非线性模式，但通常缺乏稳定性、可解释性及大规模处理效率；单头注意力机制虽提升了可解释性，但在捕获多层次依赖关系方面存在局限，且对初始化敏感，降低了可重复性。现有方法大多未能将统计可解释性与深度学习的表征能力相结合，尤其是在超高维场景下。本文提出MAFS（基于多头注意力的特征选择），一种融合统计先验与深度学习能力的混合框架。MAFS首先利用基于滤波器的先验进行稳定初始化并引导学习；随后采用多头注意力机制并行地从多视角审视特征，捕捉复杂的非线性关系与交互作用；最后通过重排序模块整合各注意力头的输出，解决冲突并最小化信息损失，从而生成鲁棒且一致的特征排序。该设计结合了统计指导与深度建模能力，在保留信息信号最大化的同时，生成可解释的重要性评分。在模拟及真实数据集（包括癌症基因表达与阿尔茨海默病数据）上的实验表明，相较于现有的基于滤波器及深度学习的方法，MAFS在覆盖度与稳定性方面均表现出显著优势，为高维生物医学数据的特征选择提供了可扩展、可解释且鲁棒的解决方案。