Referring Image Segmentation (RIS), aims to segment the object referred by a given sentence in an image by understanding both visual and linguistic information. However, existing RIS methods tend to explore top-performance models, disregarding considerations for practical applications on resources-limited edge devices. This oversight poses a significant challenge for on-device RIS inference. To this end, we propose an effective and efficient post-training quantization framework termed PTQ4RIS. Specifically, we first conduct an in-depth analysis of the root causes of performance degradation in RIS model quantization and propose dual-region quantization (DRQ) and reorder-based outlier-retained quantization (RORQ) to address the quantization difficulties in visual and text encoders. Extensive experiments on three benchmarks with different bits settings (from 8 to 4 bits) demonstrates its superior performance. Importantly, we are the first PTQ method specifically designed for the RIS task, highlighting the feasibility of PTQ in RIS applications. Code will be available at {https://github.com/gugu511yy/PTQ4RIS}.
翻译:参考图像分割(RIS)旨在通过理解视觉与语言信息,对图像中给定语句所指代的目标进行分割。然而,现有RIS方法往往追求高性能模型,忽视了在资源受限的边缘设备上实际应用的考量。这一疏漏为设备端RIS推理带来了重大挑战。为此,我们提出了一种高效且有效的训练后量化框架PTQ4RIS。具体而言,我们首先深入分析了RIS模型量化中性能下降的根本原因,并提出双区域量化(DRQ)与基于重排序的离群值保留量化(RORQ)方法,以解决视觉编码器和文本编码器中的量化难题。在三种基准数据集上采用不同比特设置(从8位到4位)的大量实验证明了该方法的优越性能。重要的是,我们是首个专门针对RIS任务设计的PTQ方法,这凸显了PTQ在RIS应用中实施的可行性。代码将在{https://github.com/gugu511yy/PTQ4RIS}公开。