While linear attention architectures offer efficient inference, compressing unbounded history into a fixed-size memory inherently limits expressivity and causes information loss. To address this limitation, we introduce Random Access Memory Network (RAM-Net), a novel architecture designed to bridge the gap between the representational capacity of full attention and the memory efficiency of linear models. The core of RAM-Net maps inputs to high-dimensional sparse vectors serving as explicit addresses, allowing the model to selectively access a massive memory state. This design enables exponential state size scaling without additional parameters, which significantly mitigates signal interference and enhances retrieval fidelity. Moreover, the inherent sparsity ensures exceptional computational efficiency, as state updates are confined to minimal entries. Extensive experiments demonstrate that RAM-Net consistently surpasses state-of-the-art baselines in fine-grained long-range retrieval tasks and achieves competitive performance in standard language modeling and zero-shot commonsense reasoning benchmarks, validating its superior capability to capture complex dependencies with significantly reduced computational overhead.
翻译:尽管线性注意力架构提供了高效的推理能力,但将无界历史压缩至固定大小的内存本质上限制了其表达能力并导致信息损失。为应对这一局限,我们引入了随机存取内存网络(RAM-Net),这是一种旨在弥合全注意力表征能力与线性模型内存效率之间差距的新型架构。RAM-Net的核心是将输入映射至作为显式地址的高维稀疏向量,使模型能够选择性地访问海量内存状态。该设计实现了状态规模的指数级扩展而无需增加参数,从而显著减轻信号干扰并提升检索保真度。此外,固有的稀疏性确保了卓越的计算效率,因为状态更新仅局限于极少数条目。大量实验表明,RAM-Net在细粒度长程检索任务中持续超越现有先进基线,并在标准语言建模与零样本常识推理基准测试中取得具有竞争力的性能,验证了其以显著降低的计算开销捕捉复杂依赖关系的卓越能力。