We investigate sample-based learning of conditional distributions on multi-dimensional unit boxes, allowing for different dimensions of the feature and target spaces. Our approach involves clustering data near varying query points in the feature space to create empirical measures in the target space. We employ two distinct clustering schemes: one based on a fixed-radius ball and the other on nearest neighbors. We establish upper bounds for the convergence rates of both methods and, from these bounds, deduce optimal configurations for the radius and the number of neighbors. We propose to incorporate the nearest neighbors method into neural network training, as our empirical analysis indicates it has better performance in practice. For efficiency, our training process utilizes approximate nearest neighbors search with random binary space partitioning. Additionally, we employ the Sinkhorn algorithm and a sparsity-enforced transport plan. Our empirical findings demonstrate that, with a suitably designed structure, the neural network has the ability to adapt to a suitable level of Lipschitz continuity locally. For reproducibility, our code is available at \url{https://github.com/zcheng-a/LCD_kNN}.
翻译:本文研究了在多维单位盒上基于样本的条件分布学习,允许特征空间与目标空间具有不同维度。我们的方法通过在特征空间中聚类不同查询点附近的数据,构建目标空间的经验测度。我们采用两种不同的聚类方案:一种基于固定半径球体,另一种基于最近邻。我们建立了两种方法的收敛速率上界,并从这些界推导出半径与邻居数的最优配置。我们建议将最近邻方法融入神经网络训练,因为实证分析表明其在实践中具有更优性能。为提高效率,训练过程采用基于随机二叉空间划分的近似最近邻搜索。此外,我们运用Sinkhorn算法和稀疏化传输方案。实证结果表明,通过适当设计的结构,神经网络能够局部自适应到合适的Lipschitz连续性水平。为保障可复现性,代码已发布于\url{https://github.com/zcheng-a/LCD_kNN}。