Visually impaired users face significant challenges in daily information access and real-time environmental perception, and there is an urgent need for intelligent assistive systems with accurate recognition capabilities. Although large-scale models provide effective solutions for perception and reasoning, their practical deployment on assistive devices is severely constrained by excessive memory consumption and high inference costs. Moreover, existing quantization strategies often ignore inter-block error accumulation, leading to degraded model stability. To address these challenges, this study proposes a novel quantization framework -- Residual-Projected Multi-Collaboration Closed-Loop and Single Instance Quantization(RPIQ), whose quantization process adopts a multi-collaborative closed-loop compensation scheme based on Single Instance Calibration and Gauss-Seidel Iterative Quantization. Experiments on various types of large-scale models, including language models such as OPT, Qwen, and LLaMA, as well as vision-language models such as CogVLM2, demonstrate that RPIQ can compress models to 4-bit representation while significantly reducing peak memory consumption (approximately 60%-75% reduction compared to original full-precision models). The method maintains performance highly close to full-precision models across multiple language and visual tasks, and exhibits excellent recognition and reasoning capabilities in key applications such as text understanding and visual question answering in complex scenarios. While verifying the effectiveness of RPIQ for deployment in real assistive systems, this study also advances the computational efficiency and reliability of large models, enabling them to provide visually impaired users with the required information accurately and rapidly.
翻译:视障用户在日常生活信息获取和实时环境感知方面面临重大挑战,亟需具备精准识别能力的智能辅助系统。尽管大规模模型为感知与推理提供了有效解决方案,但其在辅助设备上的实际部署受到内存消耗过高和推理成本巨大的严重制约。此外,现有量化策略往往忽略块间误差累积,导致模型稳定性下降。为应对这些挑战,本研究提出一种新颖的量化框架——残差投影多协作闭环与单实例量化(RPIQ),其量化过程采用基于单实例校准和高斯-赛德尔迭代量化的多协作闭环补偿方案。在多种类型的大规模模型(包括OPT、Qwen、LLaMA等语言模型,以及CogVLM2等视觉语言模型)上的实验表明,RPIQ能将模型压缩至4位表示,同时显著降低峰值内存消耗(较原始全精度模型降低约60%-75%)。该方法在多项语言和视觉任务中保持与全精度模型高度接近的性能,并在文本理解、复杂场景下的视觉问答等关键应用中展现出优异的识别与推理能力。在验证RPIQ在真实辅助系统中部署有效性的同时,本研究也推动了大模型计算效率与可靠性的进步,使其能够为视障用户提供准确而快速的信息支持。