Two Independent Teachers are Better Role Model

from arxiv, This manuscript contains 14 pages, 7 figures. We have submitted the manuscript to Journal of IEEE Transactions on Medical Imaging (TMI) in June 2023

Recent deep learning models have attracted substantial attention in infant brain analysis. These models have performed state-of-the-art performance, such as semi-supervised techniques (e.g., Temporal Ensembling, mean teacher). However, these models depend on an encoder-decoder structure with stacked local operators to gather long-range information, and the local operators limit the efficiency and effectiveness. Besides, the $MRI$ data contain different tissue properties ($TPs$) such as $T1$ and $T2$. One major limitation of these models is that they use both data as inputs to the segment process, i.e., the models are trained on the dataset once, and it requires much computational and memory requirements during inference. In this work, we address the above limitations by designing a new deep-learning model, called 3D-DenseUNet, which works as adaptable global aggregation blocks in down-sampling to solve the issue of spatial information loss. The self-attention module connects the down-sampling blocks to up-sampling blocks, and integrates the feature maps in three dimensions of spatial and channel, effectively improving the representation potential and discriminating ability of the model. Additionally, we propose a new method called Two Independent Teachers ($2IT$), that summarizes the model weights instead of label predictions. Each teacher model is trained on different types of brain data, $T1$ and $T2$, respectively. Then, a fuse model is added to improve test accuracy and enable training with fewer parameters and labels compared to the Temporal Ensembling method without modifying the network architecture. Empirical results demonstrate the effectiveness of the proposed method.

翻译：近期深度学习模型在婴儿脑部分析领域引起了广泛关注。这些模型已实现最先进性能，例如采用半监督技术（如时间集成、均值教师）。然而，这些模型依赖编码器-解码器结构，通过堆叠局部算子来获取长程信息，而局部算子限制了模型的效率和有效性。此外，MRI数据包含不同的组织属性（如T1和T2）。这些模型的一个主要局限在于，它们将两种数据同时作为分割过程的输入，即模型仅在数据集上训练一次，推理时需要大量计算和内存资源。为解决上述局限，本文设计了一种名为3D-DenseUNet的新型深度学习模型，该模型在下采样阶段采用可自适应的全局聚合模块，从而解决空间信息丢失问题。自注意力模块连接下采样块与上采样块，并在空间和通道三个维度上整合特征图，有效提升了模型的表征能力和判别能力。此外，我们提出了一种名为"双独立教师"（Two Independent Teachers, 2IT）的新方法，该方法通过聚合模型权重而非标签预测进行优化。每个教师模型分别基于不同类型的脑部数据（T1和T2）进行训练。随后引入融合模型以提升测试精度，并在不修改网络架构的前提下，相比时间集成方法使用更少的参数和标签完成训练。实证结果验证了所提方法的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日