Pedestrian Attribute Recognition (PAR) is critical for video surveillance, enabling forensic search and re-identification systems. Extreme class imbalance remains a fundamental obstacle when merging PETA and PA-100K into a 109,000-image composite corpus, where minority attributes have positive sample fractions below 1%. This causes standard BCE optimization to suppress rare traits, a phenomenon we term the majority negative class cheating trap. We present a systematic ablation of Multi-Label Focal Loss hyperparameters (alpha and gamma) on a ResNet-18 backbone. A calibrated configuration (alpha=0.50, gamma=2.0) achieves a Macro F1-score of 62.32%, matching BCE baseline while preserving superior hard-example mining and convergence dynamics. Our approach uses pure loss-function engineering with zero computational overhead for edge deployment. We identify the Sparsity Wall, a hard boundary where positive sample fractions below 0.1% make global loss reweighting ineffective, requiring instance-level intervention.
翻译:行人属性识别(PAR)在视频监控中至关重要,可支持法医搜索和重识别系统。当合并PETA和PA-100K形成包含109,000张图像的复合语料库时,极端类别不平衡仍是一个根本性障碍——其中少数属性的正样本比例低于1%。这导致标准二元交叉熵(BCE)优化会抑制罕见特征,我们将此现象称为"多数负类欺骗陷阱"。我们以ResNet-18为骨干网络,系统性地消融了多标签焦点损失的超参数(alpha和gamma)。经校准的配置(alpha=0.50, gamma=2.0)实现了62.32%的宏F1分数,与BCE基线持平,同时保留了更优的难例挖掘能力和收敛动态。本方法仅依赖纯损失函数工程,在边缘部署中实现零计算开销。我们识别出"稀疏墙"这一硬边界——当正样本比例低于0.1%时,全局损失重加权失效,需要引入实例级别的干预机制。