Damage Vision Mining Opportunity for Imbalanced Anomaly Detection

In past decade, previous balanced datasets have been used to advance algorithms for classification, object detection, semantic segmentation, and anomaly detection in industrial applications. Specifically, for condition-based maintenance, automating visual inspection is crucial to ensure high quality. Deterioration prognostic attempts to optimize the fine decision process for predictive maintenance and proactive repair. In civil infrastructure and living environment, damage data mining cannot avoid the imbalanced data issue because of rare unseen events and high quality status by improved operations. For visual inspection, deteriorated class acquired from the surface of concrete and steel components are occasionally imbalanced. From numerous related surveys, we summarize that imbalanced data problems can be categorized into four types; 1) missing range of target and label valuables, 2) majority-minority class imbalance, 3) foreground-background of spatial imbalance, 4) long-tailed class of pixel-wise imbalance. Since 2015, there has been many imbalanced studies using deep learning approaches that includes regression, image classification, object detection, semantic segmentation. However, anomaly detection for imbalanced data is not yet well known. In the study, we highlight one-class anomaly detection application whether anomalous class or not, and demonstrate clear examples on imbalanced vision datasets: blood smear, lung infection, wooden, concrete deterioration, and disaster damage. We provide key results on damage vision mining advantage, hypothesizing that the more effective range of positive ratio, the higher accuracy gain of anomaly detection application. Finally, the applicability of the damage learning methods, limitations, and future works are mentioned.

翻译：过去十年中，基于平衡数据集的研究推动了工业应用中分类、目标检测、语义分割及异常检测等算法的进步。特别是在基于状态的维护领域，自动化视觉检测对确保高质量至关重要。退化预测旨在优化预测性维护与主动修复的精细决策过程。在民用基础设施与生活环境中，由于罕见事件发生频率低且改进操作后高质量状态占主导，损伤数据挖掘无法避免数据不平衡问题。针对视觉检测，从混凝土与钢构件表面获取的退化类别数据常呈现不平衡特征。通过大量相关文献调研，我们将不平衡数据问题归纳为四类：（1）目标与标签值分布范围缺失；（2）多数类与少数类样本不平衡；（3）空间分布的前景-背景失衡；（4）像素级长尾分布类别失衡。自2015年以来，基于深度学习方法的不平衡问题研究已涵盖回归、图像分类、目标检测及语义分割等领域，但针对不平衡数据的异常检测方法尚不成熟。本研究重点聚焦于单类异常检测应用（即判断样本是否属于异常类别），并在血涂片、肺部感染、木材、混凝土退化及灾害损伤等不平衡视觉数据集上提供清晰示例。我们提出关键结论：通过损伤视觉挖掘优势，发现正样本比例的有效范围越合理，异常检测应用的精度增益越高。最后，本文讨论了损伤学习方法的适用性、局限性及未来研究方向。