Multi-reference alignment (MRA) is the problem of recovering a signal from its multiple noisy copies, each acted upon by a random group element. MRA is mainly motivated by single-particle cryo-electron microscopy (cryo-EM) that has recently joined X-ray crystallography as one of the two leading technologies to reconstruct biological molecular structures. Previous papers have shown that in the high noise regime, the sample complexity of MRA and cryo-EM is $n=\omega(\sigma^{2d})$, where $n$ is the number of observations, $\sigma^2$ is the variance of the noise, and $d$ is the lowest-order moment of the observations that uniquely determines the signal. In particular, it was shown that in many cases, $d=3$ for generic signals, and thus the sample complexity is $n=\omega(\sigma^6)$. In this paper, we analyze the second moment of the MRA and cryo-EM models. First, we show that in both models the second moment determines the signal up to a set of unitary matrices, whose dimension is governed by the decomposition of the space of signals into irreducible representations of the group. Second, we derive sparsity conditions under which a signal can be recovered from the second moment, implying sample complexity of $n=\omega(\sigma^4)$. Notably, we show that the sample complexity of cryo-EM is $n=\omega(\sigma^4)$ if at most one third of the coefficients representing the molecular structure are non-zero; this bound is near-optimal. The analysis is based on tools from representation theory and algebraic geometry. We also derive bounds on recovering a sparse signal from its power spectrum, which is the main computational problem of X-ray crystallography.
翻译:多参考对齐(MRA)是指从信号的多个含噪副本中恢复原始信号的问题,每个副本均受到随机群元素的作用。MRA主要受单颗粒冷冻电镜(cryo-EM)驱动,该技术近期已与X射线晶体学并列,成为重建生物分子结构的两种领先技术之一。已有研究表明,在高噪声条件下,MRA和cryo-EM的样本复杂度为$n=\omega(\sigma^{2d})$,其中$n$为观测数量,$\sigma^2$为噪声方差,$d$为能唯一确定信号的最低阶观测矩。特别地,许多情况下一般信号的$d=3$,因此样本复杂度为$n=\omega(\sigma^6)$。本文分析了MRA和cryo-EM模型的二阶矩。首先,我们证明在这两种模型中,二阶矩可将信号确定至一组酉矩阵,其维度由信号空间在群作用下的不可约表示分解决定。其次,我们推导了信号可从二阶矩恢复的稀疏性条件,对应样本复杂度$n=\omega(\sigma^4)$。值得注意的是,我们证明当表示分子结构的系数中非零项不超过三分之一时,cryo-EM的样本复杂度为$n=\omega(\sigma^4)$,该界接近最优。分析基于表示论和代数几何工具。我们还推导了从功率谱恢复稀疏信号的界,这是X射线晶体学的主要计算问题。