Cryo-forum: A framework for orientation recovery with uncertainty measure with the application in cryo-EM image analysis

In single-particle cryo-electron microscopy (cryo-EM), the efficient determination of orientation parameters for 2D projection images poses a significant challenge yet is crucial for reconstructing 3D structures. This task is complicated by the high noise levels present in the cryo-EM datasets, which often include outliers, necessitating several time-consuming 2D clean-up processes. Recently, solutions based on deep learning have emerged, offering a more streamlined approach to the traditionally laborious task of orientation estimation. These solutions often employ amortized inference, eliminating the need to estimate parameters individually for each image. However, these methods frequently overlook the presence of outliers and may not adequately concentrate on the components used within the network. This paper introduces a novel approach that uses a 10-dimensional feature vector to represent the orientation and applies a Quadratically-Constrained Quadratic Program to derive the predicted orientation as a unit quaternion, supplemented by an uncertainty metric. Furthermore, we propose a unique loss function that considers the pairwise distances between orientations, thereby enhancing the accuracy of our method. Finally, we also comprehensively evaluate the design choices involved in constructing the encoder network, a topic that has not received sufficient attention in the literature. Our numerical analysis demonstrates that our methodology effectively recovers orientations from 2D cryo-EM images in an end-to-end manner. Importantly, the inclusion of uncertainty quantification allows for direct clean-up of the dataset at the 3D level. Lastly, we package our proposed methods into a user-friendly software suite named cryo-forum, designed for easy accessibility by the developers.

翻译：在单粒子冷冻电子显微镜（cryo-EM）中，高效确定二维投影图像的取向参数是一项重大挑战，但对于三维结构重建至关重要。由于冷冻电镜数据集通常存在高噪声水平且包含异常值，该任务变得复杂，需要多次耗时的二维清理过程。近年来，基于深度学习的解决方案出现，为传统繁重的取向估计任务提供了更简化的方法。这些方案通常采用摊销推理，无需为每张图像单独估计参数。然而，这些方法常忽略异常值的存在，且可能未能充分关注网络内部使用的组件。本文提出一种新颖方法，使用10维特征向量表示取向，并应用二次约束二次规划将预测取向推导为单位四元数，同时辅以不确定性度量。此外，我们提出一种独特的损失函数，考虑取向之间的成对距离，从而提升方法精度。最后，我们还全面评估了编码器网络构建中的设计选择，该主题在文献中尚未得到充分关注。我们的数值分析表明，该方法能以端到端方式从二维冷冻电镜图像中有效恢复取向。重要的是，不确定性量化的引入允许在三维层面直接清理数据集。最后，我们将所提方法打包为一个用户友好的软件套件，命名为cryo-forum，旨在便于开发者轻松使用。