The standard practice in Large Language Models (LLMs) is to base predictions on final-layer representations. However, intermediate layers encode complementary task-relevant signals, and the optimal layer is task-dependent, making single-layer usage inherently suboptimal. In this work, we introduce Inter-Layer Structural Encoders (ILSE), a powerful and parameter-efficient post-training framework that learns to aggregate representations from all layers of a frozen LLM through structured inter-layer interactions. Central to ILSE is the Cayley-Encoder, a mathematically grounded module based on expander Cayley graphs that enables efficient and effective inter-layer information propagation. We evaluate ILSE on 13 classification and semantic similarity tasks across 9 pre-trained LLMs ranging from 14M to 8B parameters. ILSE consistently outperforms strong baselines, achieving up to 44% improvements in accuracy and 25% in similarity, while introducing at most 0.1% additional parameters relative to the base LLM size. Furthermore, ILSE is highly data-efficient in few-shot regimes and enables small LLMs to match or exceed the performance of substantially larger models. Notably, it also outperforms LoRA-based fine-tuning despite operating on frozen representations.
翻译:大语言模型(LLM)的标准做法是基于最终层表示进行预测。然而,中间层编码了互补的任务相关信号,且最优层取决于具体任务,这使得单层使用本质上存在次优性。本研究提出层间结构编码器(ILSE),一种高效且参数经济的后训练框架,通过结构化层间交互学习聚合冻结LLM所有层的表示。ILSE的核心是凯莱编码器——基于扩展凯莱图的数学严谨模块,能够实现高效且有效的层间信息传播。我们在13个分类和语义相似性任务上,对9个参数规模从1400万到80亿的预训练LLM进行了评估。ILSE持续优于强基线方法,准确率提升最高达44%,相似性提升最高达25%,同时引入的额外参数不超过基础LLM规模的0.1%。此外,ILSE在少样本场景下具有极高的数据效率,能使小规模LLM匹配甚至超越大规模模型的性能。值得注意的是,尽管操作的是冻结表示,ILSE的表现仍优于基于LoRA的微调方法。