Differentiating signals from the background in micrographs is a critical initial step for cryogenic electron microscopy (cryo-EM), yet it remains laborious due to low signal-to-noise ratio (SNR), the presence of contaminants and densely packed particles of varying sizes. Although image segmentation has recently been introduced to distinguish particles at the pixel level, the low SNR complicates the automated generation of accurate annotations for training supervised models. Moreover, platforms for systematically comparing different design choices in pipeline construction are lacking. Thus, a modular framework is essential to understand the advantages and limitations of this approach and drive further development. To address these challenges, we present a pipeline that automatically generates high-quality segmentation maps from cryo-EM data to serve as ground truth labels. Our modular framework enables the selection of various segmentation models and loss functions. We also integrate Conditional Random Fields (CRFs) with different solvers and feature sets to refine coarse predictions, thereby producing fine-grained segmentation. This flexibility facilitates optimal configurations tailored to cryo-EM datasets. When trained on a limited set of micrographs, our approach achieves over 90% accuracy, recall, precision, Intersection over Union (IoU), and F1-score on synthetic data. Furthermore, to demonstrate our framework's efficacy in downstream analyses, we show that the particles extracted by our pipeline produce 3D density maps with higher resolution than those generated by existing particle pickers on real experimental datasets, while achieving performance comparable to that of manually curated datasets from experts.
翻译:在冷冻电子显微镜(cryo-EM)技术中,从显微图像中区分信号与背景是至关重要的初始步骤,但由于信噪比(SNR)低、存在污染物以及尺寸各异的密集颗粒,该过程仍然十分耗时费力。尽管近期已引入图像分割技术以在像素级别识别颗粒,但低信噪比使得为监督模型训练生成精确标注的自动化过程变得复杂。此外,目前缺乏能够系统比较流程构建中不同设计选择的平台。因此,一个模块化框架对于理解该方法的优势与局限、并推动其进一步发展至关重要。为应对这些挑战,我们提出了一种能够从冷冻电镜数据自动生成高质量分割图谱以作为真实标注的流程。我们的模块化框架支持选择多种分割模型与损失函数。我们还集成了具有不同求解器与特征集的条件随机场(CRFs)来优化粗粒度预测,从而生成细粒度分割结果。这种灵活性有助于为冷冻电镜数据集定制最优配置。在仅使用少量显微图像进行训练时,我们的方法在合成数据上实现了超过90%的准确率、召回率、精确率、交并比(IoU)和F1分数。此外,为证明本框架在下游分析中的有效性,我们展示了通过本流程提取的颗粒能够在真实实验数据集上生成比现有颗粒挑选工具更高分辨率的3D密度图,同时达到与专家手动整理数据集相当的性能水平。