The Dark Energy Survey is able to collect image data of an extremely large number of extragalactic objects, and it can be reasonably assumed that many unusual objects of high scientific interest are hidden inside these data. Due to the extreme size of DES data, identifying these objects among many millions of other celestial objects is a challenging task. The problem of outlier detection is further magnified by the presence of noisy or saturated images. When the number of tested objects is extremely high, even a small rate of noise or false positives leads to a very large number of false detections, making an automatic system impractical. This study applies an automatic method for automatic detection of outlier objects in the first data release of the Dark Energy Survey. By using machine learning-based outlier detection, the algorithm is able to identify objects that are visually different from the majority of the other objects in the database. An important feature of the algorithm is that it allows to control the false-positive rate, and therefore can be used for practical outlier detection. The algorithm does not provide perfect accuracy in the detection of outlier objects, but it reduces the data substantially to allow practical outlier detection. For instance, the selection of the top 250 objects after applying the algorithm to more than $2\cdot10^6$ DES images provides a collection of uncommon galaxies. Such collection would have been extremely time-consuming to compile by using manual inspection of the data.
翻译:暗能量巡天能够收集极为庞大的河外天体图像数据,可以合理推测这些数据中隐藏着许多具有重要科学价值的罕见天体。由于DES数据体量极大,在这些数以百万计的天体中发现异常目标极具挑战性。噪声或饱和图像的存在进一步放大了异常检测的难度,当测试目标数量极其庞大时,即使极低的噪声或误报率也会导致大量虚警,使自动系统难以实际应用。本研究针对暗能量巡天首次数据发布,提出了一种自动检测异常目标的方法。通过基于机器学习的异常检测算法,该算法能够识别数据库中视觉特征与绝大多数其他目标存在差异的天体。该算法的重要特性在于能够控制误报率,从而可用于实际异常检测。虽然该算法在检测异常天体时无法达到完美精度,但能大幅缩减数据规模,使异常检测具备可行性。例如,将该算法应用于超过$2\cdot10^6$幅DES图像后,选取前250个目标即可获得一组罕见星系的集合。若通过人工目视检查方式整理此类数据集,将耗费极其漫长的时间。