储层计算作为语言模型 (Reservoir Computing as a Language Model)

Large Language Models (LLM) have dominated the science and media landscape duo to their impressive performance on processing large chunks of data and produce human-like levels of text. Nevertheless, their huge energy demand and slow processing are still a bottleneck to further increasing quality while also making the models accessible to everyone. To solve this bottleneck, we will investigate how reservoir computing performs on natural text processing, which could enable fast and energy efficient hardware implementations. Studies investigating the use of reservoir computing as a language model remain sparse. In this paper, we compare three distinct approaches for character-level language modeling, two different \emph{reservoir computing} approaches, where only an output layer is trainable, and the well-known \emph{transformer}-based architectures, which fully learn an attention-based sequence representation. We explore the performance, computational cost and prediction accuracy for both paradigms by equally varying the number of trainable parameters for all models. Using a consistent pipeline for all three approaches, we demonstrate that transformers excel in prediction quality, whereas reservoir computers remain highly efficient reducing the training and inference speed. Furthermore, we investigate two types of reservoir computing: a \emph{traditional reservoir} with a static linear readout, and an \emph{attention-enhanced reservoir} that dynamically adapts its output weights via an attention mechanism. Our findings underline how these paradigms scale and offer guidelines to balance resource constraints with performance.

翻译：大型语言模型（LLM）因其在处理大规模数据块和生成类人文本方面表现卓越，主导了科学和媒体领域。然而，其巨大的能耗和较慢的处理速度仍然是进一步提升质量、同时使模型普及化的瓶颈。为解决这一瓶颈，我们将研究储层计算在自然文本处理上的表现，这可能实现快速且节能的硬件实现。目前将储层计算用作语言模型的研究仍较为有限。本文比较了三种字符级语言建模方法：两种不同的**储层计算**方法（仅输出层可训练）与著名的基于**Transformer**的架构（完全学习基于注意力的序列表示）。通过同等调整所有模型的可训练参数数量，我们探讨了这两种范式的性能、计算成本和预测准确性。使用统一流程处理所有三种方法，我们证明Transformer在预测质量上表现优异，而储层计算机则保持高效，显著提升了训练和推理速度。此外，我们研究了两种储层计算类型：采用静态线性读出层的**传统储层**，以及通过注意力机制动态调整输出权重的**注意力增强储层**。我们的研究结果阐明了这些范式的扩展特性，并为在资源限制与性能之间取得平衡提供了指导原则。