Modular Representation Compression: Adapting LLMs for Efficient and Effective Recommendations

Recently, large language models (LLMs) have advanced recommendation systems (RSs), and recent works have begun to explore how to integrate LLMs into industrial RSs. While most approaches deploy LLMs offline to generate and pre-cache augmented representations for RSs, high-dimensional representations from LLMs introduce substantial storage and computational costs. Thus, it is crucial to compress LLM representations effectively. However, we identify a counterintuitive phenomenon during representation compression: Mid-layer Representation Advantage (MRA), where representations from middle layers of LLMs outperform those from final layers in recommendation tasks. This degraded final layer renders existing compression methods, which typically compress on the final layer, suboptimal. We interpret this based on modularity theory that LLMs develop spontaneous internal functional modularity and force the final layer to specialize in the proxy training task. Thus, we propose \underline{M}odul\underline{a}r \underline{R}epresentation \underline{C}ompression (MARC) to explicitly control the modularity of LLMs. First, Modular Adjustment explicitly introduces compression and task adaptation modules, enabling the LLM to operate strictly as a representation-learning module. Next, to ground each module to its specific task, Modular Task Decoupling uses information constraints and different network structures to decouple tasks. Extensive experiments validate that MARC addresses MRA and produces efficient representations. Notably, MARC achieved a 2.82% eCPM lift in an online A/B test within a large-scale commercial search advertising scenario.

翻译：近年来，大语言模型推动了推荐系统的进步，最新研究开始探索如何将大语言模型集成到工业级推荐系统中。尽管多数方法通过离线部署大语言模型来生成并预缓存增强表示，但其产生的高维表示带来了显著的存储与计算成本。因此，有效压缩大语言模型表示变得至关重要。然而，我们在表示压缩过程中发现一个反直觉现象——中层表示优势：在推荐任务中，大语言模型中间层的表示优于最终层的表示。这种性能退化的最终层使得现有通常针对最终层进行压缩的方法效果欠佳。基于模块化理论，我们认为大语言模型会自发形成内部功能模块化，并迫使最终层专门适配代理训练任务。为此，我们提出模块化表示压缩方法，通过显式控制大语言模型的模块化特性。首先，模块化调整模块引入压缩与任务适配模块，使大语言模型严格作为表示学习模块运行；其次，为将各模块锚定至特定任务，模块化任务解耦方法通过信息约束与差异化网络结构实现任务解耦。大量实验证明，模块化表示压缩能够有效应对中层表示优势并生成高效表示。值得注意的是，该模型在大型商业搜索广告场景的在线A/B测试中实现了2.82%的eCPM提升。