We propose a novel neural architecture for computer vision -- WaveMix -- that is resource-efficient and yet generalizable and scalable. While using fewer trainable parameters, GPU RAM, and computations, WaveMix networks achieve comparable or better accuracy than the state-of-the-art convolutional neural networks, vision transformers, and token mixers for several tasks. This efficiency can translate to savings in time, cost, and energy. To achieve these gains we used multi-level two-dimensional discrete wavelet transform (2D-DWT) in WaveMix blocks, which has the following advantages: (1) It reorganizes spatial information based on three strong image priors -- scale-invariance, shift-invariance, and sparseness of edges -- (2) in a lossless manner without adding parameters, (3) while also reducing the spatial sizes of feature maps, which reduces the memory and time required for forward and backward passes, and (4) expanding the receptive field faster than convolutions do. The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural flexibility for various tasks and levels of resource availability. WaveMix establishes new benchmarks for segmentation on Cityscapes; and for classification on Galaxy 10 DECals, Places-365, five EMNIST datasets, and iNAT-mini and performs competitively on other benchmarks. Our code and trained models are publicly available.
翻译:摘要:我们提出了一种新型计算机视觉神经架构——WaveMix,该架构兼具资源高效性、泛化能力与可扩展性。在参数量、GPU内存占用及计算量更少的情况下,WaveMix网络在多项任务上实现了与最先进卷积神经网络、视觉Transformer及Token混合器相当或更优的准确率。这种高效性可转化为时间、成本及能耗的节约。为达成上述优势,我们在WaveMix模块中采用多级二维离散小波变换(2D-DWT),其具有以下特性:(1)基于尺度不变性、平移不变性及边缘稀疏性三种强图像先验,实现空间信息的无损重组织;(2)无需增加额外参数;(3)同步缩减特征图空间尺寸,降低前向与反向传播所需的内存及时间;(4)比卷积操作更快扩展感受野。整体架构由一系列自相似且保持分辨率不变的WaveMix模块堆叠而成,可灵活适配不同任务与资源条件。WaveMix在Cityscapes分割任务、Galaxy 10 DECals分类、Places-365、五个EMNIST数据集及iNAT-mini基准上均刷新了纪录,并在其他基准测试中表现优异。我们的代码与预训练模型已公开发布。