Memory-Efficient Sparse Pyramid Attention Networks for Whole Slide Image Analysis

Whole Slide Images (WSIs) are crucial for modern pathological diagnosis, yet their gigapixel-scale resolutions and sparse informative regions pose significant computational challenges. Traditional dense attention mechanisms, widely used in computer vision and natural language processing, are impractical for WSI analysis due to the substantial data scale and the redundant processing of uninformative areas. To address these challenges, we propose Memory-Efficient Sparse Pyramid Attention Networks with Shifted Windows (SPAN), drawing inspiration from state-of-the-art sparse attention techniques in other domains. SPAN introduces a sparse pyramid attention architecture that hierarchically focuses on informative regions within the WSI, aiming to reduce memory overhead while preserving critical features. Additionally, the incorporation of shifted windows enables the model to capture long-range contextual dependencies essential for accurate classification. We evaluated SPAN on multiple public WSI datasets, observing its competitive performance. Unlike existing methods that often struggle to model spatial and contextual information due to memory constraints, our approach enables the accurate modeling of these crucial features. Our study also highlights the importance of key design elements in attention mechanisms, such as the shifted-window scheme and the hierarchical structure, which contribute substantially to the effectiveness of SPAN in WSI analysis. The potential of SPAN for memory-efficient and effective analysis of WSI data is thus demonstrated, and the code will be made publicly available following the publication of this work.

翻译：全切片图像（WSI）是现代病理诊断的关键工具，但其千兆像素级的分辨率与稀疏的信息区域带来了显著的计算挑战。传统密集注意力机制虽在计算机视觉与自然语言处理领域广泛应用，但由于WSI数据规模庞大且需对无信息区域进行冗余处理，该方法在WSI分析中并不实用。为应对这些挑战，我们借鉴其他领域先进的稀疏注意力技术，提出了基于移位窗口的内存高效稀疏金字塔注意力网络（SPAN）。SPAN采用稀疏金字塔注意力架构，通过分层机制聚焦WSI中的信息区域，旨在降低内存开销的同时保留关键特征。此外，引入移位窗口使模型能够捕获对精确分类至关重要的长距离上下文依赖关系。我们在多个公开WSI数据集上评估了SPAN，观察到其具有竞争力的性能。与现有方法常因内存限制难以建模空间和上下文信息不同，我们的方法能够精确建模这些关键特征。本研究还强调了注意力机制中关键设计要素的重要性，例如移位窗口方案和分层结构，这些要素对SPAN在WSI分析中的有效性贡献显著。因此，SPAN在内存高效且有效的WSI数据分析方面展现出巨大潜力，相关代码将在本文发表后公开。