Rapid and accurate building damage assessment in the immediate aftermath of tornadoes is critical for coordinating life-saving search and rescue operations, optimizing emergency resource allocation, and accelerating community recovery. However, current automated methods struggle with the unique visual complexity of tornado-induced wreckage, primarily due to severe domain shift from standard pre-training datasets and extreme class imbalance in real-world disaster data. To address these challenges, we introduce a systematic experimental framework evaluating 79 open-source deep learning models, encompassing both Convolutional Neural Networks (CNNs) and Vision Transformers, across over 2,300 controlled experiments on our newly curated Quad-State Tornado Damage (QSTD) benchmark dataset. Our findings reveal that achieving operational-grade performance hinges on a complex interaction between architecture and optimization, rather than architectural selection alone. Most strikingly, we demonstrate that optimizer choice can be more consequential than architecture: switching from Adam to SGD provided dramatic F1 gains of +25 to +38 points for Vision Transformer and Swin Transformer families, fundamentally reversing their ranking from bottom-tier to competitive with top-performing CNNs. Furthermore, a low learning rate of 1x10^(-4) proved universally critical, boosting average F1 performance by +10.2 points across all architectures. Our champion model, ConvNeXt-Base trained with these optimized settings, demonstrated strong cross-event generalization on the held-out Tuscaloosa-Moore Tornado Damage (TMTD) dataset, achieving 46.4% Macro F1 (+34.6 points over baseline) and retaining 85.5% Ordinal Top-1 Accuracy despite temporal and sensor domain shifts.
翻译:龙卷风发生后快速准确地进行建筑损伤评估,对于协调救生搜救行动、优化应急资源分配以及加速社区恢复至关重要。然而,当前的自动化方法在处理龙卷风所致废墟独特的视觉复杂性方面存在困难,这主要源于标准预训练数据集的严重领域偏移以及现实世界灾害数据中极端的类别不平衡。为应对这些挑战,我们引入了一个系统性实验框架,在我们新构建的Quad-State龙卷风损伤基准数据集上,评估了79个开源深度学习模型,包括卷积神经网络和Vision Transformer,进行了超过2300次受控实验。我们的研究结果表明,达到操作级性能的关键在于架构与优化之间的复杂交互作用,而非仅取决于架构选择。最引人注目的是,我们证明了优化器的选择可能比架构本身更为关键:从Adam切换到SGD,为Vision Transformer和Swin Transformer系列带来了+25至+38点的F1分数显著提升,从根本上逆转了它们的排名,使其从底层跃升至与表现最佳的CNN相竞争的水平。此外,1x10^(-4)的低学习率被证明具有普遍重要性,将所有架构的平均F1性能提升了+10.2点。我们的冠军模型ConvNeXt-Base,在采用这些优化设置进行训练后,在留出的Tuscaloosa-Moore龙卷风损伤数据集上展现了强大的跨事件泛化能力,取得了46.4%的宏平均F1分数,并保持了85.5%的序数Top-1准确率。