Distributed AI systems face critical memory management challenges across computation, communication, and deployment layers. RRAM based in memory computing suffers from scalability limitations due to device non idealities and fixed array sizes. Decentralized AI frameworks struggle with memory efficiency across NAT constrained networks due to static routing that ignores computational load. Multi agent deployment systems tightly couple application logic with execution environments, preventing adaptive memory optimization. These challenges stem from a fundamental lack of coordinated memory management across architectural layers. We introduce Self Evolving Distributed Memory Architecture for Scalable AI Systems, a three layer framework that unifies memory management across computation, communication, and deployment. Our approach features (1) memory guided matrix processing with dynamic partitioning based on device characteristics, (2) memory aware peer selection considering network topology and computational capacity, and (3) runtime adaptive deployment optimization through continuous reconfiguration. The framework maintains dual memory systems tracking both long term performance patterns and short term workload statistics. Experiments on COCO 2017, ImageNet, and SQuAD show that our method achieves 87.3 percent memory utilization efficiency and 142.5 operations per second compared to Ray Distributed at 72.1 percent and 98.7 operations per second, while reducing communication latency by 30.2 percent to 171.2 milliseconds and improving resource utilization to 82.7 percent. Our contributions include coordinated memory management across three architectural layers, workload adaptive resource allocation, and a dual memory architecture enabling dynamic system optimization.
翻译:分布式AI系统在计算、通信与部署层面均面临严峻的内存管理挑战。基于阻变存储器(RRAM)的内存计算因器件非理想特性与固定阵列尺寸而存在可扩展性局限。去中心化AI框架在受网络地址转换(NAT)约束的网络中因采用忽略计算负载的静态路由策略而遭遇内存效率瓶颈。多智能体部署系统将应用逻辑与执行环境紧密耦合,阻碍了自适应内存优化。这些挑战源于架构层面缺乏协同内存管理机制。本文提出面向可扩展AI系统的自演进分布式内存架构——一个在计算、通信与部署层面统一内存管理的三层框架。该框架具备三大特征:(1)基于器件特性的动态分区内存引导矩阵处理;(2)综合考虑网络拓扑与计算能力的内存感知对等节点选择;(3)通过持续重配置实现的运行时自适应部署优化。框架采用双内存系统,同时追踪长期性能模式与短期工作负载统计。在COCO 2017、ImageNet和SQuAD数据集上的实验表明:本方法实现了87.3%的内存利用效率与142.5次操作/秒的性能,优于Ray分布式框架的72.1%与98.7次操作/秒;同时将通信延迟降低30.2%至171.2毫秒,并将资源利用率提升至82.7%。本研究的贡献包括:跨三层架构的协同内存管理机制、工作负载自适应资源分配策略,以及支持动态系统优化的双内存架构。