Learning classification tasks of (2^nx2^n) inputs typically consist of \le n (2x2) max-pooling (MP) operators along the entire feedforward deep architecture. Here we show, using the CIFAR-10 database, that pooling decisions adjacent to the last convolutional layer significantly enhance accuracies. In particular, average accuracies of the advanced-VGG with m layers (A-VGGm) architectures are 0.936, 0.940, 0.954, 0.955, and 0.955 for m=6, 8, 14, 13, and 16, respectively. The results indicate A-VGG8s' accuracy is superior to VGG16s', and that the accuracies of A-VGG13 and A-VGG16 are equal, and comparable to that of Wide-ResNet16. In addition, replacing the three fully connected (FC) layers with one FC layer, A-VGG6 and A-VGG14, or with several linear activation FC layers, yielded similar accuracies. These significantly enhanced accuracies stem from training the most influential input-output routes, in comparison to the inferior routes selected following multiple MP decisions along the deep architecture. In addition, accuracies are sensitive to the order of the non-commutative MP and average pooling operators adjacent to the output layer, varying the number and location of training routes. The results call for the reexamination of previously proposed deep architectures and their accuracies by utilizing the proposed pooling strategy adjacent to the output layer.
翻译:针对(2^n×2^n)输入的分类任务学习通常涉及沿整个前馈深度架构的≤n个(2×2)最大池化(MP)算子。我们利用CIFAR-10数据库证明,将池化操作置于最后一个卷积层附近可显著提升精度。具体而言,具有m层结构的进阶VGG(A-VGGm)在m=6、8、14、13和16时的平均准确率分别为0.936、0.940、0.954、0.955和0.955。结果表明A-VGG8的精度优于VGG16,且A-VGG13与A-VGG16的精度相等,同时与Wide-ResNet16的精度相当。此外,将三个全连接(FC)层替换为一个FC层(A-VGG6和A-VGG14)或多个线性激活FC层,可获得相近的精度。这些显著提升的精度源于对最具影响力的输入-输出路径的训练,而与之对比,沿深度架构进行多次MP决策后选择的次级路径性能较差。此外,精度对于靠近输出层的非交换MP与平均池化算子的顺序存在敏感性,这改变了训练路径的数量与位置。该研究结果呼吁通过采用靠近输出层的池化策略重新审视既有深度架构及其精度表现。