In this paper, we address the problem of feature selection in the context of multi-label learning, by using a new estimator based on implicit regularization and label embedding. Unlike the sparse feature selection methods that use a penalized estimator with explicit regularization terms such as $l_{2,1}$-norm, MCP or SCAD, we propose a simple alternative method via Hadamard product parameterization. In order to guide the feature selection process, a latent semantic of multi-label information method is adopted, as a label embedding. Experimental results on some known benchmark datasets suggest that the proposed estimator suffers much less from extra bias, and may lead to benign overfitting.
翻译:本文针对多标签学习中的特征选择问题,提出了一种基于隐式正则化与标签嵌入的新型估计器。与采用显式正则化项(如$l_{2,1}$范数、MCP或SCAD)的稀疏特征选择方法不同,我们通过Hadamard积参数化提出了一种简洁的替代方法。为引导特征选择过程,采用多标签信息的潜在语义表示作为标签嵌入。在多个已知基准数据集上的实验结果表明,所提估计器能显著减少额外偏差,并可能实现良性过拟合。