In past decade, previous balanced datasets have been used to advance algorithms for classification, object detection, semantic segmentation, and anomaly detection in industrial applications. Specifically, for condition-based maintenance, automating visual inspection is crucial to ensure high quality. Deterioration prognostic attempts to optimize the fine decision process for predictive maintenance and proactive repair. In civil infrastructure and living environment, damage data mining cannot avoid the imbalanced data issue because of rare unseen events and high quality status by improved operations. For visual inspection, deteriorated class acquired from the surface of concrete and steel components are occasionally imbalanced. From numerous related surveys, we summarize that imbalanced data problems can be categorized into four types; 1) missing range of target and label valuables, 2) majority-minority class imbalance, 3) foreground-background of spatial imbalance, 4) long-tailed class of pixel-wise imbalance. Since 2015, there has been many imbalanced studies using deep learning approaches that includes regression, image classification, object detection, semantic segmentation. However, anomaly detection for imbalanced data is not yet well known. In the study, we highlight one-class anomaly detection application whether anomalous class or not, and demonstrate clear examples on imbalanced vision datasets: blood smear, lung infection, hazardous driving, wooden, concrete deterioration, river sludge, and disaster damage. Illustrated in Fig.1, we provide key results on damage vision mining advantage, hypothesizing that the more effective range of positive ratio, the higher accuracy gain of anomaly detection application. In our imbalanced studies, compared with the balanced case of positive ratio 1/1, we find that there is applicable positive ratio, where the accuracy are consistently high.
翻译:过去十年中,平衡数据集被广泛用于推进工业应用中分类、目标检测、语义分割及异常检测等算法的发展。具体而言,在基于状态的维护中,视觉检测自动化是保证高质量的关键环节。退化预测技术旨在优化预测性维护与主动维修的精细决策过程。在民用基础设施与生活环境中,由于罕见事件频发及改进运营带来的高质量状态,损伤数据挖掘无法避免不均衡数据问题。针对视觉检测,从混凝土与钢构件表面获取的退化类别数据时常呈现不均衡性。通过大量相关调查,我们总结出不均衡数据问题可分为四类:1) 目标与标签变量范围缺失,2) 多数-少数类别不均衡,3) 空间不均衡的前景-背景分布,4) 像素级不均衡的长尾类别。自2015年以来,基于深度学习的回归、图像分类、目标检测及语义分割等方向已涌现大量不均衡研究,但面向不均衡数据的异常检测仍鲜为人知。本研究聚焦于单类异常检测应用(判别是否为异常类),并在以下不均衡视觉数据集上展示清晰案例:血涂片、肺部感染、危险驾驶、木材、混凝土退化、河流污泥及灾害损伤。如图1所示,我们提供了损伤视觉挖掘优势的关键结果,假设正样本比例的有效区间越大,异常检测应用的准确率提升越高。在我们的不均衡研究中,与正样本比例为1:1的平衡情况相比,发现存在可适用的正样本比例区间,在此区间内准确率持续保持较高水平。