This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to measure computational load of a deep model. According to this study, however, this metric has a limitation to estimate inference time on embedded devices. This paper poses the question of what aspects are overlooked when expressed in terms of Multiply-Accumulate operations. In experiments, an image classification task is performed on an embedded system device using the CIFAR-100 dataset to compare and analyze the inference times of ten deep models with the theoretically calculated Multiply-Accumulate operations for each model. The results highlight the importance of considering additional computations between tensors when optimizing deep learning models for real-time performing in embedded systems.
翻译:本研究评估了嵌入式系统环境下多种深度学习模型的推理性能。先前研究通常采用乘积累加运算来衡量深度模型的计算负载。然而,本研究表明该指标在评估嵌入式设备推理时间方面存在局限。本文提出疑问:当以乘积累加运算表达时,哪些因素被忽视了?实验中,我们在嵌入式系统设备上使用CIFAR-100数据集执行图像分类任务,比较分析了十种深度模型的推理时间与其理论计算的乘积累加运算量。结果表明,在为嵌入式系统优化实时性能的深度学习模型时,必须充分考虑张量间的附加计算。