Anomaly detection (AD) is a crucial task in machine learning with various applications, such as detecting emerging diseases, identifying financial frauds, and detecting fake news. However, obtaining complete, accurate, and precise labels for AD tasks can be expensive and challenging due to the cost and difficulties in data annotation. To address this issue, researchers have developed AD methods that can work with incomplete, inexact, and inaccurate supervision, collectively summarized as weakly supervised anomaly detection (WSAD) methods. In this study, we present the first comprehensive survey of WSAD methods by categorizing them into the above three weak supervision settings across four data modalities (i.e., tabular, graph, time-series, and image/video data). For each setting, we provide formal definitions, key algorithms, and potential future directions. To support future research, we conduct experiments on a selected setting and release the source code, along with a collection of WSAD methods and data.
翻译:异常检测(AD)是机器学习中的一项关键任务,具有多种应用,例如检测新出现的疾病、识别金融欺诈和检测虚假新闻。然而,由于数据标注的成本和难度,为AD任务获取完整、准确且精确的标签可能代价高昂且具有挑战性。为解决这一问题,研究人员开发了能够处理不完整、不精确和不准确监督信息的AD方法,统称为弱监督异常检测(WSAD)方法。在本研究中,我们首次对WSAD方法进行了全面综述,将其分为上述三种弱监督设置,并涵盖四种数据模态(即表格数据、图数据、时间序列数据和图像/视频数据)。针对每种设置,我们提供了正式的定义、关键算法以及潜在未来方向。为支持后续研究,我们针对选定设置进行了实验,并发布了源代码以及WSAD方法和数据的集合。