This paper provides a novel parsimonious yet efficient design for zero-shot learning (ZSL), dubbed ParsNets, where we are interested in learning a composition of on-device friendly linear networks, each with orthogonality and low-rankness properties, to achieve equivalent or even better performance against existing deep models. Concretely, we first refactor the core module of ZSL, i.e., visual-semantics mapping function, into several base linear networks that correspond to diverse components of the semantic space, where the complex nonlinearity can be collapsed into simple local linearities. Then, to facilitate the generalization of local linearities, we construct a maximal margin geometry on the learned features by enforcing low-rank constraints on intra-class samples and high-rank constraints on inter-class samples, resulting in orthogonal subspaces for different classes and each subspace lies on a compact manifold. To enhance the model's adaptability and counterbalance over/under-fittings in ZSL, a set of sample-wise indicators is employed to select a sparse subset from these base linear networks to form a composite semantic predictor for each sample. Notably, maximal margin geometry can guarantee the diversity of features, and meanwhile, local linearities guarantee efficiency. Thus, our ParsNets can generalize better to unseen classes and can be deployed flexibly on resource-constrained devices. Theoretical explanations and extensive experiments are conducted to verify the effectiveness of the proposed method.
翻译:本文提出了一种新颖的简约高效零样本学习方法,名为ParsNets。该方法旨在学习由多个具备正交性与低秩特性的设备友好型线性网络组成的复合结构,以实现与现有深度模型相当甚至更优的性能。具体而言,我们首先将零样本学习的核心模块——视觉-语义映射函数——分解为若干基础线性网络,这些网络分别对应语义空间的不同组成部分,从而将复杂的非线性关系简化为局部线性关系。接着,为提升局部线性关系的泛化能力,我们通过强制施加类内样本的低秩约束与类间样本的高秩约束,在学习的特征上构建最大间隔几何结构,使得不同类别形成正交子空间,且每个子空间位于紧致流形上。为增强模型适应性并平衡零样本学习中的过拟合与欠拟合问题,我们引入一组样本级指示器,从这些基础线性网络中为每个样本选择稀疏子集,构成复合语义预测器。值得注意的是,最大间隔几何结构可保证特征的多样性,而局部线性关系则保证了计算效率。因此,ParsNets能更好地泛化至未见类别,并灵活部署于资源受限设备。理论分析与大量实验验证了所提方法的有效性。