The Wasserstein barycenter problem is to compute the average of $m$ given probability measures, which has been widely studied in many different areas; however, real-world data sets are often noisy and huge, which impedes its applications in practice. Hence, in this paper, we focus on improving the computational efficiency of two types of robust Wasserstein barycenter problem (RWB): fixed-support RWB (fixed-RWB) and free-support RWB (free-RWB); actually, the former is a subroutine of the latter. Firstly, we improve efficiency through model reducing; we reduce RWB as an augmented Wasserstein barycenter problem, which works for both fixed-RWB and free-RWB. Especially, fixed-RWB can be computed within $\widetilde{O}(\frac{mn^2}{\epsilon_+})$ time by using an off-the-shelf solver, where $\epsilon_+$ is the pre-specified additive error and $n$ is the size of locations of input measures. Then, for free-RWB, we leverage a quality guaranteed data compression technique, coreset, to accelerate computation by reducing the data set size $m$. It shows that running algorithms on the coreset is enough instead of on the original data set. Next, by combining the model reducing and coreset techniques above, we propose an algorithm for free-RWB by updating the weights and locations alternatively. Finally, our experiments demonstrate the efficiency of our techniques.
翻译:Wasserstein重心问题是在给定$m$个概率测度时计算其均值,该问题已在多个领域得到广泛研究;然而,实际数据集通常含有噪声且规模庞大,这阻碍了其在实际中的应用。因此,本文聚焦于提升两类鲁棒Wasserstein重心问题(RWB)的计算效率:固定支撑RWB(fixed-RWB)和自由支撑RWB(free-RWB);实际上,前者是后者的子任务。首先,我们通过模型约简提升效率:将RWB转化为增广Wasserstein重心问题,该转化同时适用于fixed-RWB和free-RWB。特别地,通过使用现成求解器,fixed-RWB可以在$\widetilde{O}(\frac{mn^2}{\epsilon_+})$时间内完成计算,其中$\epsilon_+$为预设可加误差,$n$为输入测度支撑点数量。其次,针对free-RWB,我们利用保证质量的数据压缩技术——coreset(核心集),通过减小数据集规模$m$来加速计算。结果表明,在corest上运行算法足以替代原始数据集上的计算。进一步,通过结合上述模型约简与coreset技术,我们提出一种通过交替更新权重与支撑点的free-RWB算法。最后,实验验证了我们技术的高效性。