Rooting in the scarcity of most attributes, realistic pedestrian attribute datasets exhibit unduly skewed data distribution, from which two types of model failures are delivered: (1) label imbalance: model predictions lean greatly towards the side of majority labels; (2) semantics imbalance: model is easily overfitted on the under-represented attributes due to their insufficient semantic diversity. To render perfect label balancing, we propose a novel framework that successfully decouples label-balanced data re-sampling from the curse of attributes co-occurrence, i.e., we equalize the sampling prior of an attribute while not biasing that of the co-occurred others. To diversify the attributes semantics and mitigate the feature noise, we propose a Bayesian feature augmentation method to introduce true in-distribution novelty. Handling both imbalances jointly, our work achieves best accuracy on various popular benchmarks, and importantly, with minimal computational budget.
翻译:基于大多数属性稀缺的现实,行人属性数据集呈现出严重偏斜的数据分布,由此导致两种模型失效模式:(1)标签不平衡:模型预测结果严重偏向多数标签类别;(2)语义不平衡:模型易因欠表征属性语义多样性不足而对其过拟合。为实现完美的标签平衡,我们提出一种新颖框架,成功将标签平衡数据重采样与属性共现问题解耦,即:在均衡某一属性采样先验的同时,避免对共现其他属性产生偏置。为丰富属性语义并抑制特征噪声,我们提出贝叶斯特征增强方法,引入符合真实分布的新特征。通过联合处理两种不平衡,本工作在多个主流基准数据集上取得最优精度,且关键优势在于计算开销极小。