High-resolution (HR) medical videos are vital for accurate diagnosis, yet are hard to acquire due to hardware limitations and physiological constraints. Clinically, the collected low-resolution (LR) medical videos present unique challenges for video super-resolution (VSR) models, including camera shake, noise, and abrupt frame transitions, which result in significant optical flow errors and alignment difficulties. Additionally, tissues and organs exhibit continuous and nuanced structures, but current VSR models are prone to introducing artifacts and distorted features that can mislead doctors. To this end, we propose MedVSR, a tailored framework for medical VSR. It first employs Cross State-Space Propagation (CSSP) to address the imprecise alignment by projecting distant frames as control matrices within state-space models, enabling the selective propagation of consistent and informative features to neighboring frames for effective alignment. Moreover, we design an Inner State-Space Reconstruction (ISSR) module that enhances tissue structures and reduces artifacts with joint long-range spatial feature learning and large-kernel short-range information aggregation. Experiments across four datasets in diverse medical scenarios, including endoscopy and cataract surgeries, show that MedVSR significantly outperforms existing VSR models in reconstruction performance and efficiency. Code released at https://github.com/CUHK-AIM-Group/MedVSR.
翻译:高分辨率医学视频对于精确诊断至关重要,但由于硬件限制和生理约束难以获取。临床上,采集的低分辨率医学视频给视频超分辨率模型带来了独特挑战,包括相机抖动、噪声和帧间突变,这些因素导致显著的光流误差和对齐困难。此外,组织和器官呈现连续且细微的结构,但现有VSR模型容易引入伪影和失真特征,可能误导医生诊断。为此,我们提出MedVSR——一个专为医学视频超分辨率设计的框架。该方法首先采用跨状态空间传播技术,通过将远距离帧投影为状态空间模型中的控制矩阵来解决对齐不精确问题,从而选择性地将一致且信息丰富的特征传播至相邻帧以实现有效对齐。此外,我们设计了内部状态空间重建模块,通过联合远程空间特征学习与大核短程信息聚合,增强组织结构并减少伪影。在涵盖内窥镜和白内障手术等多种医学场景的四个数据集上的实验表明,MedVSR在重建性能与效率方面显著优于现有VSR模型。代码发布于https://github.com/CUHK-AIM-Group/MedVSR。