The cost of labeling data often limits the performance of machine learning systems. In multi-task learning, related tasks provide information to each other and improve overall performance, but the label cost can vary among tasks. How should the label budget (i.e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance? We are the first to propose and formally define the label budget allocation problem in multi-task learning and to empirically show that different budget allocation strategies make a big difference to its performance. We propose a Task-Adaptive Budget Allocation algorithm to robustly generate the optimal budget allocation adaptive to different multi-task learning settings. Specifically, we estimate and then maximize the extent of new information obtained from the allocated budget as a proxy for multi-task learning performance. Experiments on PASCAL VOC and Taskonomy demonstrate the efficacy of our approach over other widely used heuristic labeling strategies.
翻译:标注数据的成本往往限制了机器学习系统的性能。在多任务学习中,相关任务之间相互提供信息并提升整体性能,但各任务的标注成本可能不同。如何在不同任务之间分配标签预算(即用于标注的资金总额)以实现最优的多任务性能?我们首次提出并正式定义了多任务学习中的标签预算分配问题,并通过实验表明不同的预算分配策略对其性能有显著影响。我们提出了一种任务自适应预算分配算法,能够针对不同多任务学习设置鲁棒地生成最优预算分配。具体而言,我们估计并最大化从分配预算中获得的新信息量,以此作为多任务学习性能的代理指标。在PASCAL VOC和Taskonomy数据集上的实验表明,我们的方法优于其他广泛使用的启发式标注策略。