Developing and evaluating distributed inference algorithms remains difficult due to the lack of standardized tools for modeling heterogeneous devices and networks. Existing studies often rely on ad-hoc testbeds or proprietary infrastructure, making results hard to reproduce and limiting exploration of hypothetical hardware or network configurations. We present UNIFERENCE, a discrete-event simulation (DES) framework designed for developing, benchmarking, and deploying distributed AI models within a unified environment. UNIFERENCE models device and network behavior through lightweight logical processes that synchronize only on communication primitives, eliminating rollbacks while preserving the causal order. It integrates seamlessly with PyTorch Distributed, enabling the same codebase to transition from simulation to real deployment. Our evaluation demonstrates that UNIFERENCE profiles runtime with up to 98.6% accuracy compared to real physical deployments across diverse backends and hardware setups. By bridging simulation and deployment, UNIFERENCE provides an accessible, reproducible platform for studying distributed inference algorithms and exploring future system designs, from high-performance clusters to edge-scale devices. The framework is open-sourced at https://github.com/Dogacel/Uniference.
翻译:开发和评估分布式推理算法仍缺乏标准化工具来建模异构设备和网络。现有研究往往依赖临时搭建的测试平台或专有基础设施,导致结果难以复现,并限制了假设性硬件或网络配置的探索。我们提出UNIFERENCE——一个统一环境下的离散事件仿真(DES)框架,专门用于开发、基准测试和部署分布式AI模型。UNIFERENCE通过轻量级逻辑进程对设备和网络行为建模,仅在通信原语上进行同步,无需回滚操作即可保持因果顺序。该框架与PyTorch Distributed无缝集成,使得同一代码库能够从仿真平滑过渡到实际部署。实验评估表明,相较多种后端及硬件配置下的实际物理部署,UNIFERENCE的性能剖析运行时准确率最高达98.6%。通过桥接仿真与部署环境,UNIFERENCE为研究分布式推理算法及探索从高性能集群到边缘级设备的未来系统设计,提供了一个可访问、可复现的平台。该框架已开源至https://github.com/Dogacel/Uniference。