Neuro-symbolic AI is gaining traction in domains such as large language models, scientific discovery, and autonomous systems due to its ability to combine perception with structured reasoning. However, its deployment is often constrained by high memory demands, diverse computation patterns, and complex hardware requirements. Existing hardware platforms struggle with large on-chip memory overheads, frequent pipeline stalls, limited I/O bandwidth, and inefficient handling of nonlinear operations. To address these key computational bottlenecks, we propose Overmind, a unified neuro-symbolic architecture with cross-layer optimizations. Overmind tackles these core bottlenecks through Padé approximations for universal nonlinear functions, preemptive memory bypass that eliminates costly on-chip caches, and a complete software stack that optimizes model deployment. By reconfiguring the Padé orders for approximating nonlinear functions, we also demonstrate adaptive accuracy-performance scaling. Overmind achieves an energy efficiency of 8.1 TOPS/W and a throughput of 410 GOPS for mixed neuro-symbolic workloads with minimal model accuracy loss. Compared to existing solutions, Overmind improves performance and efficiency with significantly fewer hardware resources.
翻译:摘要:神经符号人工智能正因其融合感知与结构化推理的能力,在大型语言模型、科学发现及自主系统等领域获得广泛应用。然而,其部署常受限于高内存需求、多样化计算模式及复杂硬件要求。现有硬件平台面临片上内存开销大、流水线频繁停顿、输入输出带宽受限以及非线性运算处理效率低下等挑战。为解决这些核心计算瓶颈,我们提出Overmind——一种采用跨层优化的统一神经符号架构。该架构通过以下机制突破关键瓶颈:基于Padé逼近的通用非线性函数处理、通过抢占式内存旁路消除高成本片上缓存,以及优化模型部署的完整软件栈。通过重构非线性函数逼近的Padé阶数,我们实现了自适应精度-性能缩放。Overmind在混合神经符号工作负载下可实现8.1 TOPS/W的能效比与410 GOPS的吞吐量,同时保持极低模型精度损失。与现有方案相比,Overmind以显著更少的硬件资源实现了性能与效率的双重提升。