Referring Image Segmentation (RIS), aims to segment the object referred by a given sentence in an image by understanding both visual and linguistic information. However, existing RIS methods tend to explore top-performance models, disregarding considerations for practical applications on resources-limited edge devices. This oversight poses a significant challenge for on-device RIS inference. To this end, we propose an effective and efficient post-training quantization framework termed PTQ4RIS. Specifically, we first conduct an in-depth analysis of the root causes of performance degradation in RIS model quantization and propose dual-region quantization (DRQ) and reorder-based outlier-retained quantization (RORQ) to address the quantization difficulties in visual and text encoders. Extensive experiments on three benchmarks with different bits settings (from 8 to 4 bits) demonstrates its superior performance. Importantly, we are the first PTQ method specifically designed for the RIS task, highlighting the feasibility of PTQ in RIS applications. Code and video are available at {https://github.com/gugu511yy/PTQ4RIS}.
翻译:参考图像分割(RIS)旨在通过理解视觉与语言信息,分割图像中给定语句所指代的物体。然而,现有RIS方法多侧重于探索高性能模型,忽视了在资源受限的边缘设备上实际应用的考量。这一疏漏为设备端RIS推理带来了显著挑战。为此,我们提出一种高效且有效的训练后量化框架,命名为PTQ4RIS。具体而言,我们首先深入分析了RIS模型量化中性能下降的根本原因,并提出双区域量化(DRQ)与基于重排序的异常值保留量化(RORQ),以应对视觉编码器和文本编码器中的量化难题。在三个基准数据集上采用不同比特设置(从8位到4位)的大量实验证明了该方法的优越性能。重要的是,我们是首个专门为RIS任务设计的PTQ方法,凸显了PTQ在RIS应用中的可行性。代码与演示视频发布于{https://github.com/gugu511yy/PTQ4RIS}。