Modern computer vision architectures, from CNNs to Transformers, predominantly rely on the stacking of heuristic modules: spatial mixers (Attention/Conv) followed by channel mixers (FFNs). In this work, we challenge this paradigm by returning to mathematical first principles. We propose the \textbf{Clifford Algebra Network (CAN)}, also referred to as CliffordNet, a vision backbone grounded purely in Geometric Algebra. Instead of engineering separate modules for mixing and memory, we derive a unified interaction mechanism based on the \textbf{Clifford Geometric Product} ($uv = u \cdot v + u \wedge v$). This operation ensures algebraic completeness regarding the Geometric Product by simultaneously capturing feature coherence (via the generalized inner product) and structural variation (via the exterior wedge product). Implemented via an efficient sparse rolling mechanism with \textbf{strict linear complexity $\mathcal{O}(N)$}, our model reveals a surprising emergent property: the geometric interaction is so representationally dense that standard Feed-Forward Networks (FFNs) become redundant. Empirically, CliffordNet establishes a new Pareto frontier: our \textbf{Nano} variant achieves \textbf{76.41\%} accuracy on CIFAR-100 with only \textbf{1.4M} parameters, effectively matching the heavy-weight ResNet-18 (11.2M) with \textbf{$8\times$ fewer parameters}, while our \textbf{Base} variant sets a new SOTA for tiny models at \textbf{78.05\%}. Our results suggest that global understanding can emerge solely from rigorous, algebraically complete local interactions, potentially signaling a shift where \textit{geometry is all you need}. Code is available at https://github.com/ParaMind2025/CAN.
翻译:现代计算机视觉架构,从CNN到Transformer,主要依赖于启发式模块的堆叠:空间混合器(注意力/卷积)后接通道混合器(前馈网络)。在本工作中,我们通过回归数学第一性原理来挑战这一范式。我们提出**Clifford代数网络**,亦称CliffordNet,一种完全基于几何代数构建的视觉主干网络。我们不再为特征混合与记忆机制分别设计独立模块,而是基于**Clifford几何积**($uv = u \cdot v + u \wedge v$)推导出统一的交互机制。该运算通过同时捕获特征相干性(通过广义内积)与结构变化(通过外积),确保了关于几何积的代数完备性。通过采用具有**严格线性复杂度$\mathcal{O}(N)$**的高效稀疏滚动机制实现,我们的模型揭示了一个令人惊讶的涌现特性:几何交互在表征上如此稠密,以至于标准的前馈网络变得冗余。实证结果表明,CliffordNet确立了新的帕累托前沿:我们的**Nano**变体在CIFAR-100上仅用**140万**参数即达到**76.41%**的准确率,以**8倍**的参数量优势有效匹配了重量级ResNet-18(1120万参数);而我们的**Base**变体则以**78.05%**的准确率为微型模型设立了新的性能标杆。我们的研究结果表明,全局理解可以仅从严谨且代数完备的局部交互中涌现,这或许预示着一种范式转变:**几何即一切**。代码发布于https://github.com/ParaMind2025/CAN。