The Minimum Volume Covering Ellipsoid (MVCE) problem, characterised by $n$ observations in $d$ dimensions where $n \gg d$, can be computationally very expensive in the big data regime. We apply methods from randomised numerical linear algebra to develop a data-driven leverage score sampling algorithm for solving MVCE, and establish theoretical error bounds and a convergence guarantee. Assuming the leverage scores follow a power law decay, we show that the computational complexity of computing the approximation for MVCE is reduced from $\mathcal{O}(nd^2)$ to $\mathcal{O}(nd + \text{poly}(d))$, which is a significant improvement in big data problems. Numerical experiments demonstrate the efficacy of our new algorithm, showing that it substantially reduces computation time and yields near-optimal solutions.
翻译:最小体积覆盖椭球问题在$n \gg d$($n$为$d$维空间中的观测数)的大数据场景下,计算成本可能极高。我们应用随机数值线性代数方法,开发了一种数据驱动的杠杆得分采样算法以求解该问题,并建立了理论误差界和收敛性保证。假设杠杆得分服从幂律衰减,我们证明计算该问题近似解的计算复杂度可从$\mathcal{O}(nd^2)$降低至$\mathcal{O}(nd + \text{poly}(d))$,这在大数据问题中是一个显著改进。数值实验验证了新算法的有效性,表明其能大幅减少计算时间并产生接近最优的解。