Unleashing MLLMs on the Edge: A Unified Framework for Cross-Modal ReID via Adaptive SVD Distillation

Practical cloud-edge deployment of Cross-Modal Re-identification (CM-ReID) faces challenges due to maintaining a fragmented ecosystem of specialized cloud models for diverse modalities. While Multi-Modal Large Language Models (MLLMs) offer strong unification potential, existing approaches fail to adapt them into a single end-to-end backbone and lack effective knowledge distillation strategies for edge deployment. To address these limitations, we propose MLLMEmbed-ReID, a unified framework based on a powerful cloud-edge architecture. First, we adapt a foundational MLLM into a state-of-the-art cloud model. We leverage instruction-based prompting to guide the MLLM in generating a unified embedding space across RGB, infrared, sketch, and text modalities. This model is then trained efficiently with a hierarchical Low-Rank Adaptation finetuning (LoRA-SFT) strategy, optimized under a holistic cross-modal alignment objective. Second, to deploy its knowledge onto an edge-native student, we introduce a novel distillation strategy motivated by the low-rank property in the teacher's feature space. To prioritize essential information, this method employs a Principal Component Mapping loss, while relational structures are preserved via a Feature Relation loss. Our lightweight edge-based model achieves state-of-the-art performance on multiple visual CM-ReID benchmarks, while its cloud-based counterpart excels across all CM-ReID benchmarks. The MLLMEmbed-ReID framework thus presents a complete and effective solution for deploying unified MLLM-level intelligence on resource-constrained devices. The code and models will be open-sourced soon.

翻译：跨模态重识别（CM-ReID）的实际云边部署面临挑战，原因在于需要为多样模态维护一个由专用云模型构成的碎片化生态系统。尽管多模态大语言模型（MLLMs）展现出强大的统一潜力，现有方法未能将其适配为单一的端到端骨干网络，且缺乏适用于边缘部署的有效知识蒸馏策略。为应对这些局限，我们提出了MLLMEmbed-ReID，一个基于强大云边架构的统一框架。首先，我们将一个基础MLLM适配为先进的云模型。利用基于指令的提示引导MLLM生成跨越RGB、红外、素描和文本模态的统一嵌入空间。该模型随后通过分层低秩适应微调（LoRA-SFT）策略进行高效训练，并在整体跨模态对齐目标下优化。其次，为将其知识部署到原生边缘学生模型上，我们提出了一种新颖的蒸馏策略，其动机源于教师特征空间中的低秩特性。为优先保留关键信息，该方法采用主成分映射损失，同时通过特征关系损失保持关联结构。我们的轻量级边缘模型在多个视觉CM-ReID基准测试中取得了最先进的性能，而其云端对应模型在所有CM-ReID基准测试中均表现优异。因此，MLLMEmbed-ReID框架为在资源受限设备上部署统一的MLLM级智能提供了一个完整而有效的解决方案。代码与模型即将开源。