In a modern processor, computing is the cheap part. Most of its area and energy go to \emph{addressing} -- moving operands to and from a register file and cache, and running the tags, ports, miss queues, and bypass networks that find a value where it was left. MADAR deletes that machinery by abolishing the address. All state circulates in rings of slots that advance one position per clock; instructions and data ride in the same slots; a value is named by its place in an orbit -- a \rp{} coordinate -- not by an address; a fixed station computes when a circulating instruction sweeps past its operands, on a schedule set at compile time; and a hierarchy of rings of increasing period replaces the cache hierarchy, movement between them scheduled rather than triggered by a miss. No prior circulating-store, dataflow, or statically scheduled machine combines all four of these. We define the execution model, validate it in a cycle-accurate register-transfer-level implementation, show it \emph{compilable} -- a constructive scheduler emits programs cross-checked against the implementation -- and price it with a first-order energy model. The payoff is clearest for AI acceleration: the multiply-accumulate at the heart of every matmul and convolution compiles to a streaming form whose energy per operation stays flat as the reduction grows, and the operand reuse that makes matrix multiplication efficient is carried by the ring-period hierarchy -- the memory hierarchy doing by rotation what a cache does by tags. MADAR is a new design point for any computation whose data movement is known before the program runs.
翻译:在现代处理器中,计算是最廉价的部分。其大部分面积和能耗用于"寻址"——即在寄存器堆和缓存之间移动操作数,并运行标签、端口、缺失队列和旁路网络,以找到值被保留的位置。MADAR通过废除地址来消除这些机制。所有状态均在槽环中循环,每个时钟周期前进一个位置;指令和数据承载于相同槽中;值由其轨道位置(即\emph{rp}坐标)而非地址来标识;固定计算站在编译时设定的调度时刻,当循环指令扫过其操作数时执行计算;由周期递增的环层级替代缓存层级,环间移动基于调度而非缺失触发。此前没有任何循环存储、数据流或静态调度机器能同时结合这四项特性。我们定义了执行模型,通过精确周期的寄存器传输级实现进行验证,证明了其\emph{可编译性}——构造性调度器生成的程序与实现交叉验证——并利用一阶能量模型评估其成本。其收益在AI加速中最为显著:每个矩阵乘法和卷积核心的乘累加运算被编译为流式形式,其每次操作的能量随归约规模扩大而保持平稳;使矩阵乘法高效的操作数复用由环周期层级承载——这种存储层级通过旋转实现缓存通过标签完成的功能。MADAR为任何程序运行前即可确定数据移动的计算提供了全新的设计基点。