This paper proposes a Generative Face Video Compression (GFVC) approach using Supplemental Enhancement Information (SEI), where a series of compact spatial and temporal representations of a face video signal (i.e., 2D/3D key-points, facial semantics and compact features) can be coded using SEI message and inserted into the coded video bitstream. At the time of writing, the proposed GFVC approach using SEI messages has been adopted into the official working draft of Versatile Supplemental Enhancement Information (VSEI) standard by the Joint Video Experts Team (JVET) of ISO/IEC JTC 1/SC 29 and ITU-T SG16, which will be standardized as a new version for "ITU-T H.274 | ISO/IEC 23002-7". To the best of the authors' knowledge, the JVET work on the proposed SEI-based GFVC approach is the first standardization activity for generative video compression. The proposed SEI approach has not only advanced the reconstruction quality of early-day Model-Based Coding (MBC) via the state-of-the-art generative technique, but also established a new SEI definition for future GFVC applications and deployment. Experimental results illustrate that the proposed SEI-based GFVC approach can achieve remarkable rate-distortion performance compared with the latest Versatile Video Coding (VVC) standard, whilst also potentially enabling a wide variety of functionalities including user-specified animation/filtering and metaverse-related applications.
翻译:本文提出了一种基于补充增强信息(SEI)的生成式人脸视频压缩(GFVC)方法,该方法将人脸视频信号的一系列紧凑时空表示(即2D/3D关键点、面部语义与紧凑特征)编码为SEI消息并嵌入视频码流。截至本文撰写时,该基于SEI的GFVC方案已被ISO/IEC JTC 1/SC 29与ITU-T SG16联合视频专家组(JVET)采纳为通用补充增强信息(VSEI)标准的官方工作草案,并将作为新版"ITU-T H.274 | ISO/IEC 23002-7"实现标准化。据作者所知,JVET针对该SEI基GFVC方案的工作是生成式视频压缩领域的首次标准化实践。所提出的SEI方案不仅通过前沿生成技术提升了早期基于模型编码(MBC)的重建质量,更为未来GFVC应用与部署建立了新的SEI定义框架。实验结果表明,相较于最新的通用视频编码(VVC)标准,该SEI基GFVC方法能实现卓越的率失真性能,同时具备支持用户定制动画/滤镜及元宇宙相关应用等多样化功能的潜力。