Tiny machine learning (TinyML) is a rapidly growing field aiming to democratize machine learning (ML) for resource-constrained microcontrollers (MCUs). Given the pervasiveness of these tiny devices, it is inherent to ask whether TinyML applications can benefit from aggregating their knowledge. Federated learning (FL) enables decentralized agents to jointly learn a global model without sharing sensitive local data. However, a common global model may not work for all devices due to the complexity of the actual deployment environment and the heterogeneity of the data available on each device. In addition, the deployment of TinyML hardware has significant computational and communication constraints, which traditional ML fails to address. Considering these challenges, we propose TinyReptile, a simple but efficient algorithm inspired by meta-learning and online learning, to collaboratively learn a solid initialization for a neural network (NN) across tiny devices that can be quickly adapted to a new device with respect to its data. We demonstrate TinyReptile on Raspberry Pi 4 and Cortex-M4 MCU with only 256-KB RAM. The evaluations on various TinyML use cases confirm a resource reduction and training time saving by at least two factors compared with baseline algorithms with comparable performance.
翻译:微型机器学习(TinyML)是一个快速发展的领域,旨在为资源受限的微控制器(MCU)实现机器学习的普及化。鉴于这些微型设备的普适性,自然需要探究TinyML应用能否通过聚合其知识而获益。联邦学习(FL)使分布式代理能够在无需共享敏感本地数据的情况下联合学习全局模型。然而,由于实际部署环境的复杂性及每个设备上可用数据的异质性,单一的全局模型可能无法适用于所有设备。此外,TinyML硬件的部署面临显著的计算和通信限制,而传统机器学习无法解决这些问题。针对这些挑战,我们提出TinyReptile——一种受元学习和在线学习启发的简洁高效算法,用于在微型设备间协作学习神经网络的鲁棒初始化参数,该初始化可针对新设备的数据快速适配。我们在仅配备256KB RAM的树莓派4和Cortex-M4 MCU上验证了TinyReptile。针对多种TinyML用例的评估证实,与性能相当的基线算法相比,该方法至少将资源消耗和训练时间各降低了两倍。