A feasibility and dynamics study of the Reservoir Attention Network (RAN), an architecture that injects a fixed, randomly-initialized reservoir into the mid-layer attention of a pretrained transformer to carry state across forward passes. Experiments span GPT-2 (124M, 355M) to Qwen2.5 (0.5B, 1.5B) on a single consumer GPU. The tasks are minimal probes chosen to isolate individual mechanisms; the broader always-alive agent vision is treated throughout as compute-limited future work, not a claim of this paper. The reservoir is left untrained (fixed random) by design: this isolates whether untrained recurrent dynamics alone suffice to carry usable cross-pass state, leaving trained recurrence as a complementary, more expensive direction.
翻译:本文对储层注意力网络(Reservoir Attention Network, RAN)进行可行性及动力学研究。该架构将一个固定且随机初始化的储层注入预训练Transformer的中间层注意力模块,以实现在不同前向传递间传递状态。实验覆盖从GPT-2(124M、355M)到Qwen2.5(0.5B、1.5B)模型,均在单个消费级GPU上完成。任务选取旨在隔离特定机制的极小探针任务;更为广泛的“持续活跃智能体”愿景被作为受计算资源限制的未来研究方向加以讨论,而非本文的核心主张。储层被刻意设计为未经训练(固定随机)的状态:此举旨在隔离未经训练的循环动力学是否足以传递可用的跨前向传递状态,从而将训练后的循环机制作为互补但代价更高的研究方向。