Contrastive-Adversarial and Diffusion: Exploring pre-training and fine-tuning strategies for sulcal identification

Michail Mamalakis,Héloïse de Vareilles,Shun-Chin Jim Wu,Ingrid Agartz,Lynn Egeland Mørch-Johnsen,Jane Garrison,Jon Simons,Pietro Lio,John Suckling,Graham Murray

In the last decade, computer vision has witnessed the establishment of various training and learning approaches. Techniques like adversarial learning, contrastive learning, diffusion denoising learning, and ordinary reconstruction learning have become standard, representing state-of-the-art methods extensively employed for fully training or pre-training networks across various vision tasks. The exploration of fine-tuning approaches has emerged as a current focal point, addressing the need for efficient model tuning with reduced GPU memory usage and time costs while enhancing overall performance, as exemplified by methodologies like low-rank adaptation (LoRA). Key questions arise: which pre-training technique yields optimal results - adversarial, contrastive, reconstruction, or diffusion denoising? How does the performance of these approaches vary as the complexity of fine-tuning is adjusted? This study aims to elucidate the advantages of pre-training techniques and fine-tuning strategies to enhance the learning process of neural networks in independent identical distribution (IID) cohorts. We underscore the significance of fine-tuning by examining various cases, including full tuning, decoder tuning, top-level tuning, and fine-tuning of linear parameters using LoRA. Systematic summaries of model performance and efficiency are presented, leveraging metrics such as accuracy, time cost, and memory efficiency. To empirically demonstrate our findings, we focus on a multi-task segmentation-classification challenge involving the paracingulate sulcus (PCS) using different 3D Convolutional Neural Network (CNN) architectures by using the TOP-OSLO cohort comprising 596 subjects.

翻译：过去十年间，计算机视觉领域见证了多种训练与学习方法的建立。对抗学习、对比学习、扩散去噪学习以及常规重建学习等技术已成为标准范式，代表了在各种视觉任务中广泛用于网络完整训练或预训练的先进方法。微调策略的探索已成为当前研究焦点，旨在以更低的GPU内存占用和时间成本实现高效模型调优，同时提升整体性能，低秩自适应（LoRA）等方法正是典型代表。关键问题随之产生：何种预训练技术能产生最佳效果——对抗、对比、重建还是扩散去噪？当微调复杂度变化时，这些方法的性能表现如何差异？本研究旨在阐明预训练技术与微调策略的优势，以增强神经网络在独立同分布（IID）群体中的学习过程。我们通过考察包括全参数微调、解码器微调、顶层微调以及使用LoRA的线性参数微调在内的多种案例，强调了微调的重要性。利用准确率、时间成本与内存效率等指标，系统总结了模型性能与效率。为实证验证研究结果，我们聚焦于涉及旁扣带沟（PCS）的多任务分割-分类挑战，通过采用包含596名受试者的TOP-OSLO队列，运用不同的三维卷积神经网络（CNN）架构进行实验验证。