Multi-Task Learning (MTL) is a framework, where multiple related tasks are learned jointly and benefit from a shared representation space, or parameter transfer. To provide sufficient learning support, modern MTL uses annotated data with full, or sufficiently large overlap across tasks, i.e., each input sample is annotated for all, or most of the tasks. However, collecting such annotations is prohibitive in many real applications, and cannot benefit from datasets available for individual tasks. In this work, we challenge this setup and show that MTL can be successful with classification tasks with little, or non-overlapping annotations, or when there is big discrepancy in the size of labeled data per task. We explore task-relatedness for co-annotation and co-training, and propose a novel approach, where knowledge exchange is enabled between the tasks via distribution matching. To demonstrate the general applicability of our method, we conducted diverse case studies in the domains of affective computing, face recognition, species recognition, and shopping item classification using nine datasets. Our large-scale study of affective tasks for basic expression recognition and facial action unit detection illustrates that our approach is network agnostic and brings large performance improvements compared to the state-of-the-art in both tasks and across all studied databases. In all case studies, we show that co-training via task-relatedness is advantageous and prevents negative transfer (which occurs when MT model's performance is worse than that of at least one single-task model).
翻译:多任务学习(MTL)是一种通过共享表示空间或参数迁移来协同学习多个相关任务并使其受益的框架。为提供充分的学习支持,现代MTL方法通常使用标注数据,要求各任务间的标注具有完全或足够大的重叠,即每个输入样本需对所有或大部分任务进行标注。然而,在许多实际应用中收集此类标注数据代价高昂,且无法利用各独立任务已有的数据集。本研究挑战了这一传统设定,证明当分类任务的标注重叠度较低甚至无重叠,或各任务标注数据规模差异悬殊时,MTL仍可取得良好效果。我们探索了基于任务关联性的协同标注与协同训练,并提出一种新颖方法,通过分布匹配实现任务间的知识交换。为验证方法的普适性,我们利用九个数据集在情感计算、人脸识别、物种识别及商品分类等领域开展了多样化的案例研究。针对基本表情识别与面部动作单元检测的情感任务大规模研究表明,本方法具有网络无关性,且在各项任务及所有研究数据库上相较现有最优方法均实现了显著性能提升。所有案例研究均表明,基于任务关联性的协同训练具有优越性,能够有效防止负迁移(即多任务模型性能劣于至少一个单任务模型的情况)。