We present BALDUR, a novel Bayesian algorithm designed to deal with multi-modal datasets and small sample sizes in high-dimensional settings while providing explainable solutions. To do so, the proposed model combines within a common latent space the different data views to extract the relevant information to solve the classification task and prune out the irrelevant/redundant features/data views. Furthermore, to provide generalizable solutions in small sample size scenarios, BALDUR efficiently integrates dual kernels over the views with a small sample-to-feature ratio. Finally, its linear nature ensures the explainability of the model outcomes, allowing its use for biomarker identification. This model was tested over two different neurodegeneration datasets, outperforming the state-of-the-art models and detecting features aligned with markers already described in the scientific literature.
翻译:本文提出BALDUR,一种新颖的贝叶斯算法,旨在处理高维环境下的多模态数据集与小样本场景,同时提供可解释的解决方案。该模型通过将不同数据视图整合到公共潜在空间中,提取解决分类任务的相关信息,并剔除无关/冗余的特征与数据视图。此外,为在小样本场景中获得泛化性良好的解决方案,BALDUR在样本-特征比较低的情况下,高效集成了跨视图的双重核函数。最后,其线性特性确保了模型结果的可解释性,使其适用于生物标志物识别。该模型在两个不同的神经退行性疾病数据集上进行了测试,其性能优于现有先进模型,且检测到的特征与科学文献中已报道的标志物具有一致性。