High-speed optical-resolution photoacoustic microscopy (OR-PAM) with bidirectional raster scanning doubles imaging speed but introduces coupled domain shift and geometric misalignment between forward and backward scan lines. Existing registration methods, constrained by brightness constancy assumptions, achieve limited alignment quality, while recent generative approaches address domain shift through complex architectures that lack temporal awareness across frames. We propose GPEReg-Net, a scene-appearance disentanglement framework that separates domain-invariant scene features from domain-specific appearance codes via Adaptive Instance Normalization (AdaIN), enabling direct image-to-image registration without explicit deformation field estimation. To exploit temporal structure in sequential acquisitions, we introduce a Global Position Encoding (GPE) module that combines learnable position embeddings with sinusoidal encoding and cross-frame attention, allowing the network to leverage context from neighboring frames for improved temporal coherence. On the OR-PAM-Reg-4K benchmark (432 test samples), GPEReg-Net achieves NCC of 0.953, SSIM of 0.932, and PSNR of 34.49dB, surpassing the state-of-the-art by 3.8% in SSIM and 1.99dB in PSNR while maintaining competitive NCC. Code is available at https://github.com/JiahaoQin/GPEReg-Net.
翻译:具有双向光栅扫描的高速光学分辨率光声显微镜(OR-PAM)将成像速度提高了一倍,但引入了前向和后向扫描线之间的耦合域偏移和几何错位。现有的配准方法受限于亮度恒定假设,对齐质量有限;而最近的生成方法虽然通过复杂的架构解决了域偏移问题,但缺乏跨帧的时间感知能力。我们提出了GPEReg-Net,这是一种场景-外观解耦框架,通过自适应实例归一化(AdaIN)将域不变场景特征与域特定外观代码分离,从而无需显式形变场估计即可实现直接的图像到图像配准。为了利用序列采集中的时间结构,我们引入了全局位置编码(GPE)模块,该模块将可学习的位置嵌入与正弦编码和跨帧注意力相结合,使网络能够利用相邻帧的上下文信息以改善时间一致性。在OR-PAM-Reg-4K基准测试(432个测试样本)上,GPEReg-Net实现了0.953的归一化互相关(NCC)、0.932的结构相似性指数(SSIM)和34.49dB的峰值信噪比(PSNR),在SSIM上超过现有最佳方法3.8%,在PSNR上超过1.99dB,同时保持了具有竞争力的NCC。代码可在 https://github.com/JiahaoQin/GPEReg-Net 获取。