CliffordNet: All You Need is Geometric Algebra

Modern computer vision architectures, from CNNs to Transformers, predominantly rely on the stacking of heuristic modules: spatial mixers (Attention/Conv) followed by channel mixers (FFNs). In this work, we challenge this paradigm by returning to mathematical first principles. We propose the Clifford Algebra Network (CAN), also referred to as CliffordNet, a vision backbone grounded purely in Geometric Algebra. Instead of engineering separate modules for mixing and memory, we derive a unified interaction mechanism based on the Clifford Geometric Product ($uv = u \cdot v + u \wedge v$). This operation ensures algebraic completeness regarding the Geometric Product by simultaneously capturing feature coherence (via the generalized inner product) and structural variation (via the exterior wedge product). Implemented via an efficient sparse rolling mechanism with strict linear complexity $O(N)$, our model reveals a surprising emergent property: the geometric interaction is so representationally dense that standard Feed-Forward Networks (FFNs) become redundant. Empirically, CliffordNet establishes a new Pareto frontier: our Nano variant achieves 77.82\% accuracy on CIFAR-100 with only 1.4M parameters, effectively matching the heavy-weight ResNet-18 (11.2M) with $8\times$ fewer parameters, while our Lite variant (2.6M) sets a new SOTA for tiny models at 79.05\%. Our results suggest that global understanding can emerge solely from rigorous, algebraically complete local interactions, potentially signaling a shift where geometry is all you need. Code is available at https://github.com/ParaMind2025/CAN.

翻译：现代计算机视觉架构（从CNN到Transformer）主要依赖于启发式模块的堆叠：空间混合器（注意力/卷积）后接通道混合器（前馈网络）。本研究通过回归数学第一性原理挑战这一范式。我们提出基于几何代数构建的视觉骨干网络——Clifford代数网络（简称CliffordNet）。我们不再为特征混合与记忆机制分别设计模块，而是基于Clifford几何积（$uv = u \cdot v + u \wedge v$）推导出统一的交互机制。该运算通过同时捕获特征相干性（通过广义内积）与结构变异（通过外楔积），确保了关于几何积的代数完备性。通过采用严格线性复杂度$O(N)$的高效稀疏滚动机制实现，我们的模型揭示出令人惊异的涌现特性：几何交互具有如此密集的表征能力，以至于标准前馈网络变得冗余。实验表明，CliffordNet建立了新的帕累托前沿：我们的Nano变体仅用140万参数即在CIFAR-100上达到77.82%准确率，以8倍更少的参数有效匹配了重量级ResNet-18（1120万参数）；而Lite变体（260万参数）以79.05%的准确率为微型模型设立了新的性能标杆。我们的结果表明，全局理解可以仅从严格代数完备的局部交互中涌现，这可能预示着几何即所需的全新范式转向。代码已开源：https://github.com/ParaMind2025/CAN。