We consider sequential maximization of performance metrics that are general functions of a confusion matrix of a classifier (such as precision, F-measure, or G-mean). Such metrics are, in general, non-decomposable over individual instances, making their optimization very challenging. While they have been extensively studied under different frameworks in the batch setting, their analysis in the online learning regime is very limited, with only a few distinguished exceptions. In this paper, we introduce and analyze a general online algorithm that can be used in a straightforward way with a variety of complex performance metrics in binary, multi-class, and multi-label classification problems. The algorithm's update and prediction rules are appealingly simple and computationally efficient without the need to store any past data. We show the algorithm attains $\mathcal{O}(\frac{\ln n}{n})$ regret for concave and smooth metrics and verify the efficiency of the proposed algorithm in empirical studies.
翻译:我们考虑顺序最大化分类器混淆矩阵的一般函数(如精确率、F值或G均值)所定义的性能指标。这类指标通常无法在单个实例上分解,因此其优化极具挑战性。尽管在批量学习框架下已有广泛研究,但在在线学习领域对其分析却非常有限,仅有少数特例。本文提出并分析了一种通用在线算法,该算法可直接应用于二分类、多分类及多标签分类问题中的多种复杂性能指标。该算法的更新与预测规则简洁直观,计算高效且无需存储任何历史数据。我们证明该算法对于凹且平滑的指标能达到$\mathcal{O}(\frac{\ln n}{n})$的遗憾界,并通过实证研究验证了所提算法的有效性。