Image classification is a fundamental task in computer vision, and the quest to enhance DNN accuracy without inflating model size or latency remains a pressing concern. We make a couple of advances in this regard, leading to a novel EncodeNet design and training framework. The first advancement involves Converting Autoencoders, a novel approach that transforms images into an easy-to-classify image of its class. Our prior work that applied the Converting Autoencoder and a simple classifier in tandem achieved moderate accuracy over simple datasets, such as MNIST and FMNIST. However, on more complex datasets like CIFAR-10, the Converting Autoencoder has a large reconstruction loss, making it unsuitable for enhancing DNN accuracy. To address these limitations, we generalize the design of Converting Autoencoders by leveraging a larger class of DNNs, those with architectures comprising feature extraction layers followed by classification layers. We incorporate a generalized algorithmic design of the Converting Autoencoder and intraclass clustering to identify representative images, leading to optimized image feature learning. Next, we demonstrate the effectiveness of our EncodeNet design and training framework, improving the accuracy of well-trained baseline DNNs while maintaining the overall model size. EncodeNet's building blocks comprise the trained encoder from our generalized Converting Autoencoders transferring knowledge to a lightweight classifier network - also extracted from the baseline DNN. Our experimental results demonstrate that EncodeNet improves the accuracy of VGG16 from 92.64% to 94.05% on CIFAR-10 and RestNet20 from 74.56% to 76.04% on CIFAR-100. It outperforms state-of-the-art techniques that rely on knowledge distillation and attention mechanisms, delivering higher accuracy for models of comparable size.
翻译:图像分类是计算机视觉中的基础任务,如何在保持模型规模与推理时延不变的前提下提升DNN精度仍是亟待解决的关键问题。我们在此方向取得若干进展,提出了创新的EncodeNet设计与训练框架。首个进展为转换自编码器(Converting Autoencoders)——一种将图像转换为其所属类别的易分类图像的新方法。我们先前将转换自编码器与简单分类器联用的工作,在MNIST和FMNIST等简单数据集上取得了中等精度。然而在CIFAR-10等更复杂的数据集上,转换自编码器存在较大的重构损失,难以有效提升DNN精度。为克服这些局限,我们通过对特征提取层与分类层构成架构的DNN进行泛化,拓展了转换自编码器的设计。通过引入转换自编码器的泛化算法设计与类内聚类以识别代表性图像,实现了优化的图像特征学习。进而,我们验证了EncodeNet设计与训练框架的有效性,其在保持整体模型规模的同时,提升了预训练基准DNN的精度。EncodeNet的构建模块包括:从泛化转换自编码器中提取的已训练编码器,以及从基线DNN中提取的轻量级分类器网络。实验结果表明,EncodeNet在CIFAR-10上将VGG16精度从92.64%提升至94.05%,在CIFAR-100上将ResNet20精度从74.56%提升至76.04%。该方法优于依赖知识蒸馏与注意力机制的现有技术,在同等规模模型上实现了更高精度。