Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological signal better than hand-crafted features. This work explores how weakly supervised and self-supervised deep learning approaches scale when training larger models on larger datasets. Our results show that both CNN- and ViT-based masked autoencoders significantly outperform weakly supervised models. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion unique crops sampled from 95-million microscopy images achieves relative improvements as high as 28% over our best weakly supervised models at inferring known biological relationships curated from public databases.
翻译:从高内涵显微镜筛选中的细胞表型推断生物学关系,为生物学研究提供了重要机遇与挑战。先前研究表明,深度视觉模型比手工特征能更有效地捕获生物信号。本研究探索了弱监督与自监督深度学习方法在更大数据集上训练更大模型时的扩展性。结果表明,基于CNN与ViT的掩码自编码器模型显著优于弱监督模型。在扩展性研究的最高端,一个ViT-L/8模型在从9500万张显微图像中采样的超过35亿个独特作物区域上训练后,相较于最佳弱监督模型,在推断公共数据库收录的已知生物学关系时实现了高达28%的相对性能提升。