Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects in a wafer is generally harder compared to identifying a single defect. Recently, deep learning methods have gained significant traction in mixed-type DPR. However, the complexity of defects requires complex and large models making them very difficult to operate on low-memory embedded devices typically used in fabrication labs. Another common issue is the unavailability of labeled data to train complex networks. In this work, we propose an unsupervised training routine to distill the knowledge of complex pre-trained models to lightweight deployment-ready models. We empirically show that this type of training compresses the model without sacrificing accuracy despite being up to 10 times smaller than the teacher model. The compressed model also manages to outperform contemporary state-of-the-art models.
翻译:晶圆制造是一项涉及数千个步骤的复杂工艺。晶圆图谱的缺陷模式识别(Defect Pattern Recognition, DPR)对于确定生产缺陷的根本原因至关重要,这有助于进一步提升晶圆代工厂的良率。在制造过程中,各类缺陷可能单独出现在晶圆上,也可能以不同组合形式呈现。相较于识别单一缺陷,识别晶圆上的多重缺陷通常更具挑战性。近年来,深度学习方法在混合型缺陷模式识别领域取得了显著进展。然而,缺陷的复杂性要求采用复杂且大型的模型,这使得此类模型难以部署在晶圆厂常用的低内存嵌入式设备上。另一个常见问题是标注数据不足,难以训练复杂网络。本研究提出一种无监督训练流程,通过知识蒸馏将复杂预训练模型的知识迁移至轻量级、可直接部署的模型。实验结果表明,尽管压缩后的模型体积仅为教师模型的十分之一,但该训练方法在保持精度的同时成功实现了模型压缩。此外,压缩模型的性能甚至优于当前最先进的模型。