The recent statistical theory of neural networks focuses on nonparametric denoising problems that treat randomness as additive noise. Variability in image classification datasets does, however, not originate from additive noise but from variation of the shape and other characteristics of the same object across different images. To address this problem, we introduce a tractable model for supervised image classification. While from the function estimation point of view, every pixel in an image is a variable, and large images lead to high-dimensional function recovery tasks suffering from the curse of dimensionality, increasing the number of pixels in the proposed image deformation model enhances the image resolution and makes the object classification problem easier. We introduce and theoretically analyze three approaches. Two methods combine image alignment with a one-nearest neighbor classifier. Under a minimal separation condition, it is shown that perfect classification is possible. The third method fits a convolutional neural network (CNN) to the data. We derive a rate for the misclassification error that depends on the sample size and the complexity of the deformation class. A small empirical study corroborates the theoretical findings on images generated from the MNIST handwritten digit database.
翻译:近期神经网络统计理论主要关注将随机性视为加性噪声的非参数去噪问题。然而,图像分类数据集中的变异性并非源于加性噪声,而是源于同一物体在不同图像中形状及其他特征的变异。为解决此问题,我们引入了一个可处理的监督图像分类模型。从函数估计的角度看,图像中每个像素均为变量,大型图像会导致高维函数恢复任务遭受维度灾难;而在所提出的图像形变模型中,增加像素数量反而能提升图像分辨率,使物体分类问题更易处理。我们提出并从理论上分析了三种方法。其中两种方法将图像对齐技术与一最近邻分类器相结合。在最小分离条件下,证明了完美分类的可能性。第三种方法通过卷积神经网络(CNN)拟合数据。我们推导了误分类误差的收敛速率,该速率取决于样本量及形变类别的复杂度。一项基于MNIST手写数字数据库生成图像的实证研究佐证了理论结论。