Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl [4], which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of [14], which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.
翻译:低差异点(亦称拟蒙特卡洛点)是在单位立方体中通过确定性方式精心选取的点集,可用于逼近均匀分布。本文探索了基于此类低差异点的两种方法,旨在通过缩减大规模数据集以训练神经网络。第一种方法采用Dick与Feischl[4]提出的基于数字网与平均化处理的方案。基于实验发现,我们构建了第二种方法,该方法同样使用数字网,但以Voronoi聚类替代平均化操作。两种方法均与[14]提出的supercompress方法(K-means聚类算法的变体)进行比较。比较指标涵盖不同目标函数下的压缩误差以及神经网络训练的准确度。