Kathleen: Oscillator-Based Byte-Level Text Classification Without Tokenization or Attention

from arxiv, 15 pages, 10 tables. v2: Added V9 architecture with Positional Decay Modulation. Pretraining eliminated. SST-2 improved from 83.3% to 85.8%

We present Kathleen, a text classification architecture that operates directly on raw UTF-8 bytes using frequency-domain processing -- requiring no tokenizer, no attention mechanism, and under 470K parameters. Kathleen introduces several novel components: (1) RecurrentOscillatorBanks -- damped sinusoid convolutions with temporal memory for O(L) sequence processing; (2) an FFT-Rotate Wavetable Encoder that maps all 256 byte values using a single learnable vector (256 floats); (3) PhaseHarmonics -- a sinusoidal non-linearity with just 6 learnable phase parameters (+2.6% accuracy, <0.001% of model parameters); (4) Content-Dependent Reverb with Positional Decay Modulation -- a temporal memory mechanism whose decay rate is jointly conditioned on input content and a learned position-indexed bias vector; (5) Token-Level Module Sequencer with consonance and dissonance interference channels. Through iterative architecture evolution from an initial 733K-parameter baseline (Kathleen-Clean) to the current Kathleen-V9 (469K parameters), we demonstrate that pretraining can be entirely eliminated while improving accuracy. Kathleen-V9 achieves 88.5% +/- 0.2% on IMDB, 92.4% +/- 0.2% on AG News, and 85.8% +/- 0.5% on SST-2 (3-seed averages) -- matching or exceeding the pretrained baseline on all benchmarks with 36% fewer parameters. On SST-2, the improvement is +2.5% absolute over the pretrained predecessor. Kathleen processes sequences in O(L) time and memory.

翻译：我们提出Kathleen，一种直接对原始UTF-8字节进行频域处理的文本分类架构——无需分词器、无需注意力机制，且参数量低于47万。Kathleen引入多项创新组件：（1）递归振荡器库——具有时间记忆功能的阻尼正弦波卷积，实现O(L)序列处理；（2）FFT旋转波形表编码器——利用单个可学习向量（256个浮点数）映射全部256个字节值；（3）相位谐波——仅含6个可学习相位参数的正弦非线性函数（精度提升+2.6%，参数量占比<0.001%）；（4）内容相关混响与位置衰减调制——一种时间记忆机制，其衰减率由输入内容和学习到的位置索引偏置向量共同调节；（5）包含协和与不协和干扰通道的词元级模块排序器。通过从初始73.3万参数基线（Kathleen-Clean）到当前Kathleen-V9（46.9万参数）的迭代架构演化，我们证明可在消除预训练的同时提升准确率。Kathleen-V9在IMDB上达到88.5%±0.2%，在AG News上达到92.4%±0.2%，在SST-2上达到85.8%±0.5%（3次随机种子平均）——在所有基准测试中以36%更少的参数匹配或超越预训练基线。在SST-2上，相比预训练前代模型实现了+2.5%的绝对提升。Kathleen以O(L)时间和内存复杂度处理序列。