Convolutional Neural Networks (CNNs) have been the standard for image classification tasks for a long time, but more recently attention-based mechanisms have gained traction. This project aims to compare traditional CNNs with attention-augmented CNNs across an image classification task. By evaluating and comparing their performance, accuracy and computational efficiency, the project will highlight benefits and trade-off of the localized feature extraction of traditional CNNs and the global context capture in attention-augmented CNNs. By doing this, we can reveal further insights into their respective strengths and weaknesses, guide the selection of models based on specific application needs and ultimately, enhance understanding of these architectures in the deep learning community. This was our final project for CS7643 Deep Learning course at Georgia Tech.
翻译:长期以来,卷积神经网络(CNNs)一直是图像分类任务的标准架构,但近年来基于注意力的机制逐渐受到关注。本项目旨在通过图像分类任务,比较传统CNN与注意力增强型CNN的性能。通过评估和对比它们在性能、准确率和计算效率方面的表现,本项目将揭示传统CNN局部特征提取与注意力增强型CNN全局上下文捕获能力之间的优势与权衡。通过此项研究,我们能够进一步理解两种架构各自的优势与局限,根据具体应用需求指导模型选择,并最终促进深度学习社区对这些架构的理解。本工作是我们于佐治亚理工学院CS7643深度学习课程的最终项目。