Temporal Coding as a Substrate for Sensorimotor Object Inference: A Spiking Reinterpretation of Thousand Brains Architecture

The Thousand Brains Theory (TBT) and its open-source Monty framework model object recognition through sensorimotor inference -- identifying objects by actively moving a sensor across their surface and building evidence contact by contact. The current implementation encodes each contact as a dense floating-point vector. While Monty tracks inter-step displacement and accumulates evidence across contacts, it treats the feature activation pattern at each contact as an unordered set - the directional sequence in which features are encountered carries no representational weight. In TBT, the sequence of contacts carries spatial meaning: knowing that feature A was felt before feature B during a left-to-right sweep tells you something about where A and B sit on the object. Dense vectors discard this ordering. We propose replacing dense vectors with rank-order spike packets: each contact produces a brief burst of neural events where the most strongly activated neuron fires first. The time gap between successive bursts implicitly encodes sensor displacement without explicit coordinate calculations. A biologically motivated learning rule (STDP) encodes traversal direction into synaptic weights. A learnable parameter lambda adjusts reliance on earlier versus recent contacts, adapting to each object's geometry. We derive three testable predictions and specify an implementation of four components in approximately 450 lines of NumPy. Three synthetic experiments confirm the core claims: temporal coding achieves perfect discrimination accuracy on objects with identical features in different spatial arrangements, where dense accumulation performs at chance; temporal coding maintains a 30-50 percentage point advantage across all tested noise levels; the adaptive lambda converges to distinct values, reflecting object geometric complexity. End-to-end evaluation on Monty's YCB benchmark is left for future work.

翻译：千脑理论(TBT)及其开源Monty框架通过感觉运动推理实现物体识别——通过主动移动传感器扫过物体表面，并逐次接触累积证据来识别物体。当前实现将每次接触编码为稠密浮点向量。虽然Monty跟踪步间位移并跨接触累积证据，但它将每次接触的特征激活模式视为无序集合——特征被遇到的定向序列不具表征权重。在TBT中，接触序列蕴含空间意义：在从左到右的扫描中，知道特征A在特征B之前被感知，能揭示A和B在物体上的相对位置。稠密向量丢弃了这一顺序信息。我们提出用秩阶脉冲包取代稠密向量：每次触发生成短暂神经事件爆发，其中激活最强的神经元最先放电。连续爆发间的时间间隙隐式编码传感器位移，无需显式坐标计算。具有生物学合理性的学习规则(STDP)将遍历方向编码为突触权重。可学习参数lambda调整对早期与近期接触的依赖权重，适应每个物体的几何结构。我们推导出三个可验证预测，并利用约450行NumPy代码实现四个组件。三项合成实验验证核心主张：在特征相同但空间排列不同的物体上，时间编码实现完美判别准确率（稠密累积方法仅达随机水平）；在所有测试噪声水平下，时间编码保持30-50个百分点优势；自适应lambda收敛至不同值，反映物体几何复杂度。Monty的YCB基准端到端评估留待未来研究。