Quantized-Tinyllava: a new multimodal foundation model enables efficient split learning

Multimodal foundation models are increasingly trained on sensitive data across domains such as finance, biomedicine, and personal identifiers. However, this distributed setup raises serious privacy concerns due to the need for cross-partition data sharing. Split learning addresses these concerns by enabling collaborative model training without raw data exchange between partitions, yet it introduces a significant challenge: transmitting high-dimensional intermediate feature representations between partitions leads to substantial communication costs. To address this challenge, we propose Quantized-TinyLLaVA, a multimodal foundation model with an integrated communication-efficient split learning framework. Our approach adopts a compression module that quantizes intermediate feature into discrete representations before transmission, substantially reducing communication overhead. Besides, we derive a principled quantization strategy grounded in entropy coding theory to determine the optimal number of discrete representation levels. We deploy our framework in a two-partition setting, with one partition operating as the client and the other as the server, to realistically simulate distributed training. Under this setup, Quantized-TinyLLaVA achieves an approximate \textbf{87.5\%} reduction in communication overhead with 2-bit quantization, while maintaining performance of the original 16-bit model across five benchmark datasets. Furthermore, our compressed representations exhibit enhanced resilience against feature inversion attacks, validating the privacy of transmission. The code is available at https://github.com/anonymous-1742/Quantized-TinyLLaVA.

翻译：多模态基础模型越来越多地在金融、生物医学和个人身份识别等领域的敏感数据上进行训练。然而，这种分布式设置由于需要跨分区共享数据而引发了严重的隐私担忧。分割学习通过实现无需在分区之间交换原始数据的协作模型训练来解决这些问题，但它引入了一个重大挑战：在分区之间传输高维中间特征表示会导致巨大的通信开销。为了应对这一挑战，我们提出了Quantized-TinyLLaVA，这是一种集成了通信高效分割学习框架的多模态基础模型。我们的方法采用了一个压缩模块，在传输前将中间特征量化为离散表示，从而显著降低了通信开销。此外，我们基于熵编码理论推导出一种有原则的量化策略，以确定最优的离散表示级别数量。我们在一个双分区设置中部署了我们的框架，其中一个分区作为客户端，另一个作为服务器，以真实模拟分布式训练。在此设置下，Quantized-TinyLLaVA通过2位量化实现了约**87.5%**的通信开销减少，同时在五个基准数据集上保持了原始16位模型的性能。此外，我们的压缩表示表现出对特征反转攻击更强的抵御能力，验证了传输的隐私性。代码可在 https://github.com/anonymous-1742/Quantized-TinyLLaVA 获取。