Implicit neural representations have emerged as a promising paradigm for video compression, with recent methods achieving competitive performance on natural video. However, screen content video -- common in remote desktop, online education, and cloud gaming -- exhibits distinct statistics: sharp edges, limited color palettes, and strong temporal redundancy. Existing neural representation methods, designed for natural scenes, lack mechanisms to exploit these properties, leaving substantial room for improvement. In this paper, we propose NeR-SC, a neural representation framework tailored for screen content video. Building on the SNeRV backbone, NeR-SC introduces three screen-content-specific modules: (i) a learnable color palette that models the discrete color structure of screen content by restricting the low-frequency sub-band to a learned color set; (ii) a multi-gate dense fusion module that replaces sequential feature fusion with dense, attention-gated cross-stage interaction; and (iii) an embedding-level frame skip strategy that bypasses redundant decoder invocations for static frames, with zero training overhead. Experiments on DSCVC and VCD show that NeR-SC achieves 40.32~dB and 41.73~dB average PSNR, outperforming representative neural video representation methods and, at low bitrates, surpassing H.264 and H.265. The skip strategy enables real-time decoding with no loss in quality.
翻译:隐式神经表示已成为视频压缩的一种有前景范式,近期方法在自然视频上取得了具有竞争力的性能。然而,屏幕内容视频——常见于远程桌面、在线教育和云游戏——表现出独特统计特征:锐利边缘、有限调色板和强时间冗余。现有针对自然场景设计的神经表示方法缺乏利用这些特性的机制,留下了显著改进空间。本文提出NeR-SC,一种专为屏幕内容视频设计的神经表示框架。该框架以SNeRV骨干网络为基础,引入三个屏幕内容专用模块:(i) 可学习调色板,通过将低频子带约束至学习到的颜色集来建模屏幕内容的离散颜色结构;(ii) 多门控密集融合模块,将顺序特征融合替换为密集的注意力门控跨阶段交互;(iii) 嵌入级帧跳过策略,对静态帧绕过冗余的解码器调用且零训练开销。在DSCVC和VCD上的实验表明,NeR-SC分别达到40.32 dB和41.73 dB平均峰值信噪比,优于代表性神经视频表示方法,且在低码率下超越H.264和H.265。帧跳过策略实现了无质量损失的实时解码。