Compressed Deep Learning (DL) models are essential for deployment in resource-constrained environments. But their performance often lags behind their large-scale counterparts. To bridge this gap, we propose Alignment Adapter (AlAd): a lightweight, sliding-window-based adapter. It aligns the token-level embeddings of a compressed model with those of the original large model. AlAd preserves local contextual semantics, enables flexible alignment across differing dimensionalities or architectures, and is entirely agnostic to the underlying compression method. AlAd can be deployed in two ways: as a plug-and-play module over a frozen compressed model, or by jointly fine-tuning AlAd with the compressed model for further performance gains. Through experiments on BERT-family models across three token-level NLP tasks, we demonstrate that AlAd significantly boosts the performance of compressed models with only marginal overhead in size and latency.
翻译:压缩深度学习模型对于在资源受限环境中部署至关重要,但其性能往往落后于大规模原始模型。为弥合这一差距,我们提出对齐适配器:一种基于滑动窗口的轻量级适配器。它能够将压缩模型的词元级嵌入与原始大模型的嵌入进行对齐。该适配器保留了局部上下文语义,支持不同维度或架构间的灵活对齐,且完全独立于底层压缩方法。AlAd可通过两种方式部署:作为冻结压缩模型之上的即插即用模块,或与压缩模型联合微调以进一步提升性能。通过在BERT系列模型上对三项词元级自然语言处理任务进行实验,我们证明AlAd能以极小的规模和延迟开销显著提升压缩模型的性能。