Automated Machine Learning for Multi-Label Classification

Automated machine learning (AutoML) aims to select and configure machine learning algorithms and combine them into machine learning pipelines tailored to a dataset at hand. For supervised learning tasks, most notably binary and multinomial classification, aka single-label classification (SLC), such AutoML approaches have shown promising results. However, the task of multi-label classification (MLC), where data points are associated with a set of class labels instead of a single class label, has received much less attention so far. In the context of multi-label classification, the data-specific selection and configuration of multi-label classifiers are challenging even for experts in the field, as it is a high-dimensional optimization problem with multi-level hierarchical dependencies. While for SLC, the space of machine learning pipelines is already huge, the size of the MLC search space outnumbers the one of SLC by several orders. In the first part of this thesis, we devise a novel AutoML approach for single-label classification tasks optimizing pipelines of machine learning algorithms, consisting of two algorithms at most. This approach is then extended first to optimize pipelines of unlimited length and eventually configure the complex hierarchical structures of multi-label classification methods. Furthermore, we investigate how well AutoML approaches that form the state of the art for single-label classification tasks scale with the increased problem complexity of AutoML for multi-label classification. In the second part, we explore how methods for SLC and MLC could be configured more flexibly to achieve better generalization performance and how to increase the efficiency of execution-based AutoML systems.

翻译：自动化机器学习（AutoML）旨在选择并配置机器学习算法，将其组合成针对特定数据集的机器学习流水线。对于监督学习任务，尤其是二分类与多分类（即单标签分类，SLC），此类AutoML方法已展现出显著成果。然而，多标签分类（MLC）任务——其中数据点不再关联单一类别标签，而是关联一组类别标签——目前受到的关注相对较少。在MLC背景下，针对特定数据选择并配置多标签分类器即使对领域专家也极具挑战性，因其本质上是包含多层级层级依赖关系的高维优化问题。尽管SLC的机器学习流水线空间已相当庞大，MLC的搜索空间规模仍比SLC高出数个数量级。在本文第一部分中，我们为单标签分类任务设计了一种新型AutoML方法，该方法最多优化由两个算法组成的机器学习流水线。随后，该方法被扩展至优化任意长度的流水线，最终实现多标签分类方法复杂层级结构的动态配置。此外，我们探究了当前最先进的单标签分类AutoML方法在应对多标签分类AutoML中更高问题复杂度时的扩展能力。在第二部分中，我们探索如何更灵活地配置SLC与MLC方法以获得更优泛化性能，以及如何提升基于执行的AutoML系统的运行效率。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日