Offline batch inference is a common task in the industry for deep learning applications, but it can be challenging to ensure stability and performance when dealing with large amounts of data and complicated inference pipelines. This paper demonstrated AntBatchInfer, an elastic batch inference framework, which is specially optimized for the non-dedicated cluster. AntBatchInfer addresses these challenges by providing multi-level fault-tolerant capabilities, enabling the stable execution of versatile and long-running inference tasks. It also improves inference efficiency by pipelining, intra-node, and inter-node scaling. It further optimizes the performance in complicated multiple-model batch inference scenarios. Through extensive experiments and real-world statistics, we demonstrate the superiority of our framework in terms of stability and efficiency. In the experiment, it outperforms the baseline by at least $2\times$ and $6\times$ in the single-model or multiple-model batch inference. Also, it is widely used at Ant Group, with thousands of daily jobs from various scenarios, including DLRM, CV, and NLP, which proves its practicability in the industry.
翻译:离线批处理推理是工业界深度学习应用中的常见任务,但在处理大规模数据和复杂推理流水线时,确保稳定性和性能面临诸多挑战。本文展示了AntBatchInfer——一个专为非专用集群优化的弹性批处理推理框架。该框架通过提供多层级容错能力,支持多样化长时间推理任务的稳定执行;同时借助流水线、节点内及跨节点扩展机制提升推理效率,并进一步优化了复杂多模型批处理场景下的性能。通过广泛实验与真实业务统计,我们验证了该框架在稳定性和效率方面的优越性。实验结果表明,在单模型或多模型批处理推理中,其性能较基线提升至少2倍和6倍。目前该框架已在蚂蚁集团广泛部署,涵盖DLRM、CV、NLP等多种场景的每日数千个任务,充分证明了其在工业界的实用性。