The dependence on expert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While supervised learning drove the initial wave of clinical algorithms, a paradigm shift towards unsupervised and self-supervised learning (SSL) is currently unlocking the latent potential of biobank-scale datasets. By learning directly from the intrinsic structure of data - whether pixels in a magnetic resonance image (MRI), voxels in a volumetric scan, or tokens in a genomic sequence - these methods facilitate the discovery of novel phenotypes, the linkage of morphology to genetics, and the detection of anomalies without human bias. This article synthesises seminal and recent advances in "learning without labels," highlighting how unsupervised frameworks can derive heritable cardiac traits, predict spatial gene expression in histology, and detect pathologies with performance that rivals or exceeds supervised counterparts.
翻译:长期以来,对专家标注的依赖一直是人工智能在生物医学应用中的主要速率限制步骤。尽管监督学习推动了临床算法的第一波浪潮,但当前向无监督与自监督学习(SSL)的范式转变正在释放生物样本库规模数据集的潜在价值。通过直接从数据的内在结构(无论是磁共振图像中的像素、容积扫描中的体素,还是基因组序列中的标记)中学习,这些方法促进了新表型的发现、形态学与遗传学的关联,以及无人类偏见的异常检测。本文综合了"无标签学习"领域具有开创性和近期的重要进展,重点阐述无监督框架如何推导可遗传的心脏特征、预测组织学中的空间基因表达,并以媲美甚至超越监督方法的性能检测病理状态。