Over the past decade, previous balanced datasets have been used to advance deep learning algorithms for industrial applications. In urban infrastructures and living environments, damage data mining cannot avoid imbalanced data issues because of rare unseen events and the high-quality status of improved operations. For visual inspection, the deteriorated class acquired from the surface of concrete and steel components are occasionally imbalanced. From numerous related surveys, we conclude that imbalanced data problems can be categorised into four types: 1) missing range of target and label valuables, 2) majority-minority class imbalance, 3) foreground background of spatial imbalance, and 4) long-tailed class of pixel-wise imbalance. Since 2015, many imbalanced studies have been conducted using deep-learning approaches, including regression, image classification, object detection, and semantic segmentation. However, anomaly detection for imbalanced data is not well known. In this study, we highlight a one-class anomaly detection application, whether anomalous class or not, and demonstrate clear examples of imbalanced vision datasets: medical disease, hazardous behaviour, material deterioration, plant disease, river sludge, and disaster damage. We provide key results on the advantage of damage-vision mining, hypothesising that the more effective the range of the positive ratio, the higher the accuracy gain of the anomalies feedback. In our imbalanced studies, compared with the balanced case with a positive ratio of $1/1$, we find that there is an applicable positive ratio $1/a$ where the accuracy is consistently high. However, the extremely imbalanced range is from one shot to $1/2a$, the accuracy of which is inferior to that of the applicable ratio. In contrast, with a positive ratio ranging over $2/a$, it shifts in the over-mining phase without an effective gain in accuracy.
翻译:过去十年间,平衡数据集被广泛用于推动工业应用中的深度学习算法发展。在城市基础设施与生活环境中,由于罕见不可见事件及改进运行的高质量状态,损伤数据挖掘无法避免数据不平衡问题。在视觉检测中,从混凝土和钢材构件表面获取的劣化类别数据时常呈现不平衡特征。基于大量相关文献调研,我们总结出数据不平衡问题可分为四类:1) 目标值与标签值的缺失范围;2) 多数类-少数类的不平衡;3) 前景-背景的空间不平衡;4) 长尾类别的像素级不平衡。自2015年以来,众多不平衡研究采用深度学习方法展开,涵盖回归、图像分类、目标检测与语义分割等任务。然而,针对不平衡数据的异常检测研究尚不充分。本研究聚焦于一类异常检测应用,即判定是否为异常类别,并通过清晰的示例展示不平衡视觉数据集:医学疾病、危险行为、材料劣化、植物病害、河流污泥及灾害损伤。我们论证了损伤视觉挖掘的优势,提出假设:正样本比率范围越有效,异常反馈的准确率增益越高。在不平衡实验中发现:与正样本比率为 $1/1$ 的平衡情形相比,存在一个适用正样本比率 $1/a$,在该比率下准确率稳定保持较高水平。然而,当极度不平衡范围从单样本延伸至 $1/2a$ 时,其准确率低于适用比率;当正样本比率超过 $2/a$ 时,系统进入过度挖掘阶段,准确率不再呈现有效提升。