Cryogenic Electron Tomography (CryoET) combined with sub-volume averaging (SVA) is the only imaging modality capable of resolving protein structures inside cells at molecular resolution. Particle picking, the task of localizing and classifying target proteins in 3D CryoET volumes, remains the main bottleneck. Due to the reliance on time-consuming manual labels, the vast reserve of unlabeled tomograms remains underutilized. In this work, we present a fast, label-efficient semi-supervised framework that exploits this untapped data. Our framework consists of two components: (i) an end-to-end heatmap-supervised detection model inspired by keypoint detection, and (ii) a teacher-student co-training mechanism that enhances performance under sparse labeling conditions. Furthermore, we introduce multi-view pseudo-labeling and a CryoET-specific DropBlock augmentation strategy to further boost performance. Extensive evaluations on the large-scale CZII dataset show that our approach improves F1 by 10% over supervised baselines, underscoring the promise of semi-supervised learning for leveraging unlabeled CryoET data.
翻译:冷冻电子断层扫描(CryoET)结合子体积平均(SVA)是目前唯一能够在分子分辨率下解析细胞内蛋白质结构的成像技术。粒子拾取——即在三维CryoET体数据中定位并分类目标蛋白质的任务——仍然是主要瓶颈。由于依赖耗时长的人工标注,大量未标注的断层扫描数据仍未得到充分利用。本研究提出了一种快速、标签高效的半监督框架,以挖掘这些未开发数据的潜力。该框架包含两个核心组件:(i)一种受关键点检测启发的端到端热图监督检测模型;(ii)一种在稀疏标注条件下提升性能的师生协同训练机制。此外,我们引入了多视角伪标签生成策略及针对CryoET数据特性的DropBlock增强方法,以进一步提升性能。在大规模CZII数据集上的综合评估表明,本方法相比全监督基线将F1分数提升了10%,这凸显了半监督学习在利用未标注CryoET数据方面的巨大潜力。