Multiple Deep Neural Networks (DNNs) integrated into single Deep Learning (DL) inference pipelines e.g. Multi-Task Learning (MTL) or Ensemble Learning (EL), etc., albeit very accurate, pose challenges for edge deployment. In these systems, models vary in their quantization tolerance and resource demands, requiring meticulous tuning for accuracy-latency balance. This paper introduces an automated heterogeneous quantization approach for DL inference pipelines with multiple DNNs.
翻译:多个深度神经网络(DNN)集成于单一深度学习(DL)推理管道(例如多任务学习(MTL)、集成学习(EL)等)虽具有极高准确性,却对边缘部署构成挑战。在此类系统中,各模型对量化的容忍度及资源需求各异,需精细调参以权衡精度与延迟。本文提出一种面向包含多个DNN的深度学习推理管道的自动化异构量化方法。