Unsupervised pre-training approaches have achieved great success in many fields such as Computer Vision (CV), Natural Language Processing (NLP) and so on. However, compared to typical deep learning models, pre-training or even fine-tuning the state-of-the-art self-attention models is extremely expensive, as they require much more computational and memory resources. It severely limits their applications and success in a variety of domains, especially for multi-task learning. To improve the efficiency, we propose Device Tuning for the efficient multi-task model, which is a massively multitask framework across the cloud and device and is designed to encourage learning of representations that generalize better to many different tasks. Specifically, we design Device Tuning architecture of a multi-task model that benefits both cloud modelling and device modelling, which reduces the communication between device and cloud by representation compression. Experimental results demonstrate the effectiveness of our proposed method.
翻译:无监督预训练方法在计算机视觉、自然语言处理等多个领域取得了巨大成功。然而,与典型深度学习模型相比,预训练甚至微调最先进的自注意力模型极其昂贵,因为它们需要更多的计算和内存资源。这严重限制了它们在诸多领域的应用与成效,尤其是在多任务学习场景中。为了提升效率,我们提出了面向高效多任务模型的设备微调方法,这是一种跨云-端的大规模多任务框架,旨在促进学习能够更好地泛化至多种不同任务的表征。具体而言,我们设计了一种多任务模型的设备微调架构,该架构同时有利于云端建模与设备端建模,并通过表征压缩减少了设备与云端之间的通信。实验结果证明了我们提出方法的有效性。