On-device Self-supervised Learning of Visual Perception Tasks aboard Hardware-limited Nano-quadrotors

from arxiv, \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Sub-\SI{50}{\gram} nano-drones are gaining momentum in both academia and industry. Their most compelling applications rely on onboard deep learning models for perception despite severe hardware constraints (\ie sub-\SI{100}{\milli\watt} processor). When deployed in unknown environments not represented in the training data, these models often underperform due to domain shift. To cope with this fundamental problem, we propose, for the first time, on-device learning aboard nano-drones, where the first part of the in-field mission is dedicated to self-supervised fine-tuning of a pre-trained convolutional neural network (CNN). Leveraging a real-world vision-based regression task, we thoroughly explore performance-cost trade-offs of the fine-tuning phase along three axes: \textit{i}) dataset size (more data increases the regression performance but requires more memory and longer computation); \textit{ii}) methodologies (\eg fine-tuning all model parameters vs. only a subset); and \textit{iii}) self-supervision strategy. Our approach demonstrates an improvement in mean absolute error up to 30\% compared to the pre-trained baseline, requiring only \SI{22}{\second} fine-tuning on an ultra-low-power GWT GAP9 System-on-Chip. Addressing the domain shift problem via on-device learning aboard nano-drones not only marks a novel result for hardware-limited robots but lays the ground for more general advancements for the entire robotics community.

翻译：重量低于50克的纳米无人机在学术界和工业界正日益受到关注。其最具吸引力的应用依赖于机载深度学习模型进行感知，尽管面临严苛的硬件限制（例如处理器功耗低于100毫瓦）。当部署在训练数据未覆盖的未知环境中时，这些模型常因域偏移而性能下降。为应对这一根本问题，我们首次提出在纳米无人机上实现机载学习：在实地任务的第一阶段，专用于对预训练卷积神经网络（CNN）进行自监督微调。基于真实世界的视觉回归任务，我们从三个维度深入探索了微调阶段的性能-成本权衡：\textit{i}) 数据集规模（更多数据虽能提升回归性能，但需更多内存和更长计算时间）；\textit{ii}) 方法（例如微调全部模型参数与仅微调子集）；\textit{iii}) 自监督策略。与预训练基线相比，我们的方法将平均绝对误差最高降低30%，且仅需在超低功耗GWT GAP9系统级芯片上进行22秒微调。通过机载学习解决纳米无人机的域偏移问题，不仅为硬件受限机器人领域提供了突破性成果，也为整个机器人学界的更广泛进步奠定了基础。