Serial femtosecond crystallography at X-ray free electron laser facilities opens a new era for the determination of crystal structure. However, the data processing of those experiments is facing unprecedented challenge, because the total number of diffraction patterns needed to determinate a high-resolution structure is huge. Machine learning methods are very likely to play important roles in dealing with such a large volume of data. Convolutional neural networks have made a great success in the field of pattern classification, however, training of the networks need very large datasets with labels. Th is heavy dependence on labeled datasets will seriously restrict the application of networks, because it is very costly to annotate a large number of diffraction patterns. In this article we present our job on the classification of diffraction pattern by weakly supervised algorithms, with the aim of reducing as much as possible the size of the labeled dataset required for training. Our result shows that weakly supervised methods can significantly reduce the need for the number of labeled patterns while achieving comparable accuracy to fully supervised methods.
翻译:X射线自由电子激光设施的串行飞秒晶体学开启了晶体结构测定的新时代。然而,这些实验的数据处理正面临前所未有的挑战,因为测定高分辨率结构所需的衍射图样总数极为庞大。机器学习方法在处理如此大规模数据方面极有可能发挥重要作用。卷积神经网络在模式分类领域已取得巨大成功,但网络训练需要大量带有标注的数据集。这种对标注数据集的高度依赖将严重限制网络的应用,因为对大量衍射图样进行标注的成本极其高昂。本文介绍了我们利用弱监督算法进行衍射图样分类的研究工作,旨在尽可能减少训练所需标注数据集的大小。结果表明,弱监督方法能在显著降低标注图样需求的同时,获得与全监督方法相当的分类精度。