Image pyramids are commonly used in modern computer vision tasks to obtain multi-scale features for precise understanding of images. However, image pyramids process multiple resolutions of images using the same large-scale model, which requires significant computational cost. To overcome this issue, we propose a novel network architecture known as the Parameter-Inverted Image Pyramid Networks (PIIP). Our core idea is to use models with different parameter sizes to process different resolution levels of the image pyramid, thereby balancing computational efficiency and performance. Specifically, the input to PIIP is a set of multi-scale images, where higher resolution images are processed by smaller networks. We further propose a feature interaction mechanism to allow features of different resolutions to complement each other and effectively integrate information from different spatial scales. Extensive experiments demonstrate that the PIIP achieves superior performance in tasks such as object detection, segmentation, and image classification, compared to traditional image pyramid methods and single-branch networks, while reducing computational cost. Notably, when applying our method on a large-scale vision foundation model InternViT-6B, we improve its performance by 1%-2% on detection and segmentation with only 40%-60% of the original computation. These results validate the effectiveness of the PIIP approach and provide a new technical direction for future vision computing tasks. Our code and models are available at https://github.com/OpenGVLab/PIIP.
翻译:图像金字塔在现代计算机视觉任务中被广泛用于获取多尺度特征以实现对图像的精确理解。然而,图像金字塔通常使用相同的大规模模型处理多个分辨率的图像,这需要巨大的计算开销。为克服这一问题,我们提出了一种新颖的网络架构——参数翻转图像金字塔网络。我们的核心思想是使用不同参数规模的模型处理图像金字塔的不同分辨率层级,从而在计算效率与性能之间取得平衡。具体而言,PIIP的输入是一组多尺度图像,其中高分辨率图像由较小规模的网络处理。我们进一步提出了一种特征交互机制,使不同分辨率的特征能够相互补充,并有效整合来自不同空间尺度的信息。大量实验表明,相较于传统图像金字塔方法和单分支网络,PIIP在目标检测、分割和图像分类等任务中实现了更优的性能,同时降低了计算成本。值得注意的是,当我们将该方法应用于大规模视觉基础模型InternViT-6B时,仅需原始计算量的40%-60%,即可在检测与分割任务上提升1%-2%的性能。这些结果验证了PIIP方法的有效性,并为未来视觉计算任务提供了新的技术方向。我们的代码与模型已发布于https://github.com/OpenGVLab/PIIP。