When deploying neural networks in real-life situations, the size and computational effort are often the limiting factors. This is especially true in environments where big, expensive hardware is not affordable, like in embedded medical devices, where budgets are often tight. State-of-the-art proposed multiple different lightweight solutions for such use cases, mostly by changing the base model architecture, not taking the input and output resolution into consideration. In this paper, we propose our architecture that takes advantage of the fact that in hardware-limited environments, we often refrain from using the highest available input resolutions to guarantee a higher throughput. Although using lower-resolution input leads to a significant reduction in computing and memory requirements, it may also incur reduced prediction quality. Our architecture addresses this problem by exploiting the fact that we can still utilize high-resolution ground-truths in training. The proposed model inputs lower-resolution images and high-resolution ground truths, which can improve the prediction quality by 5.5% while adding less than 200 parameters to the model. %reducing the frames per second only from 25 to 20. We conduct an extensive analysis to illustrate that our architecture enhances existing state-of-the-art frameworks for lightweight semantic segmentation of cancer in MRI images. We also tested the deployment speed of state-of-the-art lightweight networks and our architecture on Nvidia's Jetson Nano to emulate deployment in resource-constrained embedded scenarios.
翻译:在神经网络的实际部署中,模型规模和计算开销往往是关键制约因素。这一现象在预算有限的嵌入式医疗设备等场景中尤为突出——此类环境难以承担昂贵的大型硬件。现有先进研究主要通过在基础模型架构层面进行改进来提出多种轻量化解决方案,但并未将输入与输出分辨率纳入考量。本文提出的架构巧妙利用以下事实:在硬件受限环境中,为保障更高吞吐量,我们往往避免使用最高可用输入分辨率。虽然采用低分辨率输入能显著降低计算与内存需求,但可能导致预测质量下降。本架构通过挖掘训练过程中仍可运用高分辨率标签数据这一特性来解决该问题。所提出的模型输入低分辨率图像与高分辨率标签,可在仅增加不到200个参数的情况下将预测质量提升5.5%(帧率仅从25 fps降至20 fps)。我们通过全面分析证明,该架构能增强现有用于MRI图像癌症轻量级语义分割的先进框架。同时,我们在英伟达Jetson Nano上测试了现有先进轻量级网络及本架构的部署速度,以模拟资源受限的嵌入式应用场景。