This paper undertakes the task of replicating the MaskFormer model a universal image segmentation model originally developed using the PyTorch framework, within the TensorFlow ecosystem, specifically optimized for execution on Tensor Processing Units (TPUs). Our implementation exploits the modular constructs available within the TensorFlow Model Garden (TFMG), encompassing elements such as the data loader, training orchestrator, and various architectural components, tailored and adapted to meet the specifications of the MaskFormer model. We address key challenges encountered during the replication, non-convergence issues, slow training, adaptation of loss functions, and the integration of TPU-specific functionalities. We verify our reproduced implementation and present qualitative results on the COCO dataset. Although our implementation meets some of the objectives for end-to-end reproducibility, we encountered challenges in replicating the PyTorch version of MaskFormer in TensorFlow. This replication process is not straightforward and requires substantial engineering efforts. Specifically, it necessitates the customization of various components within the TFMG, alongside thorough verification and hyper-parameter tuning. The replication is available at: https://github.com/PurdueDualityLab/tf-maskformer/tree/main/official/projects/maskformer
翻译:本文致力于在TensorFlow生态系统中复现MaskFormer模型——一种最初基于PyTorch框架开发的通用图像分割模型,并针对张量处理单元(TPU)的执行进行优化。我们的实现利用了TensorFlow Model Garden(TFMG)中的模块化组件,包括数据加载器、训练协调器及多种架构组件,根据MaskFormer模型的技术规范进行了定制和适配。我们解决了复现过程中的关键挑战:非收敛问题、训练缓慢、损失函数的适配以及TPU特定功能的集成。我们验证了复现实现的正确性,并在COCO数据集上展示了定性结果。尽管我们的实现满足端到端可复现性的部分目标,但在TensorFlow中复现PyTorch版本的MaskFormer时仍面临挑战。该复现过程并非直接了当,需要投入大量工程工作。具体而言,需要对TFMG中的多个组件进行定制化改造,同时进行详尽的验证和超参数调优。复现代码可通过以下链接获取:https://github.com/PurdueDualityLab/tf-maskformer/tree/main/official/projects/maskformer