CRISP：基于条件随机场的冷冻电镜图像分割与处理框架 (CRISP: A Framework for Cryo-EM Image Segmentation and Processing with Conditional Random Field)

Differentiating signals from the background in micrographs is a critical initial step for cryogenic electron microscopy (cryo-EM), yet it remains laborious due to low signal-to-noise ratio (SNR), the presence of contaminants and densely packed particles of varying sizes. Although image segmentation has recently been introduced to distinguish particles at the pixel level, the low SNR complicates the automated generation of accurate annotations for training supervised models. Moreover, platforms for systematically comparing different design choices in pipeline construction are lacking. Thus, a modular framework is essential to understand the advantages and limitations of this approach and drive further development. To address these challenges, we present a pipeline that automatically generates high-quality segmentation maps from cryo-EM data to serve as ground truth labels. Our modular framework enables the selection of various segmentation models and loss functions. We also integrate Conditional Random Fields (CRFs) with different solvers and feature sets to refine coarse predictions, thereby producing fine-grained segmentation. This flexibility facilitates optimal configurations tailored to cryo-EM datasets. When trained on a limited set of micrographs, our approach achieves over 90% accuracy, recall, precision, Intersection over Union (IoU), and F1-score on synthetic data. Furthermore, to demonstrate our framework's efficacy in downstream analyses, we show that the particles extracted by our pipeline produce 3D density maps with higher resolution than those generated by existing particle pickers on real experimental datasets, while achieving performance comparable to that of manually curated datasets from experts.

翻译：在冷冻电子显微镜（cryo-EM）技术中，从显微图像中区分信号与背景是至关重要的初始步骤，但由于信噪比（SNR）低、存在污染物以及尺寸各异的密集颗粒，该过程仍然十分耗时费力。尽管近期已引入图像分割技术以在像素级别识别颗粒，但低信噪比使得为监督模型训练生成精确标注的自动化过程变得复杂。此外，目前缺乏能够系统比较流程构建中不同设计选择的平台。因此，一个模块化框架对于理解该方法的优势与局限、并推动其进一步发展至关重要。为应对这些挑战，我们提出了一种能够从冷冻电镜数据自动生成高质量分割图谱以作为真实标注的流程。我们的模块化框架支持选择多种分割模型与损失函数。我们还集成了具有不同求解器与特征集的条件随机场（CRFs）来优化粗粒度预测，从而生成细粒度分割结果。这种灵活性有助于为冷冻电镜数据集定制最优配置。在仅使用少量显微图像进行训练时，我们的方法在合成数据上实现了超过90%的准确率、召回率、精确率、交并比（IoU）和F1分数。此外，为证明本框架在下游分析中的有效性，我们展示了通过本流程提取的颗粒能够在真实实验数据集上生成比现有颗粒挑选工具更高分辨率的3D密度图，同时达到与专家手动整理数据集相当的性能水平。

相关内容

条件随机场

关注 341

条件随机域（场）（conditional random fields，简称 CRF，或CRFs），是一种判别式概率模型，是随机场的一种，常用于标注或分析序列资料，如自然语言文字或是生物序列。如同马尔可夫随机场，条件随机场为具有无向的图模型，图中的顶点代表随机变量，顶点间的连线代表随机变量间的相依关系，在条件随机场中，随机变量 Y 的分布为条件机率，给定的观察值则为随机变量 X。原则上，条件随机场的图模型布局是可以任意给定的，一般常用的布局是链结式的架构，链结式架构不论在训练（training）、推论（inference）、或是解码（decoding）上，都存在效率较高的算法可供演算。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日