Sliced optimal transport (SOT), or sliced Wasserstein (SW) distance, is widely recognized for its statistical and computational scalability. In this work, we further enhance computational scalability by proposing the first method for estimating SW from sample streams, called streaming sliced Wasserstein (Stream-SW). To define Stream-SW, we first introduce a streaming estimator of the one-dimensional Wasserstein distance (1DW). Since the 1DW has a closed-form expression, given by the integral of the absolute difference between the quantile functions of the compared distributions, we leverage quantile approximation techniques for sample streams to define a streaming 1DW estimator. By applying the streaming 1DW to all projections, we obtain Stream-SW. The key advantage of Stream-SW is its low memory complexity while providing theoretical guarantees on the approximation error. We demonstrate that Stream-SW achieves a more accurate approximation of SW than random subsampling, with lower memory consumption, when comparing Gaussian distributions and mixtures of Gaussians from streaming samples. Additionally, we conduct experiments on point cloud classification, point cloud gradient flows, and streaming change point detection to further highlight the favorable performance of the proposed Stream-SW.
翻译:切片最优输运(SOT),即切片Wasserstein(SW)距离,因其统计与计算的可扩展性而广受认可。本文通过提出首个从样本流中估计SW的方法——称为流式切片Wasserstein(Stream-SW),进一步增强了计算可扩展性。为定义Stream-SW,我们首先引入一维Wasserstein距离(1DW)的流式估计量。由于1DW具有闭式表达式(由比较分布的量化函数之差的绝对值的积分给出),我们利用样本流的量化近似技术来定义流式1DW估计量。通过对所有投影应用流式1DW,我们得到Stream-SW。Stream-SW的关键优势在于其低内存复杂度,同时提供近似误差的理论保证。我们证明,在比较来自流式样本的高斯分布及高斯混合分布时,相较于随机子采样,Stream-SW能以更低的内存消耗实现对SW的更精确近似。此外,我们通过点云分类、点云梯度流和流式变化点检测实验,进一步凸显了所提出的Stream-SW的优越性能。