The rapid expansion of uplink-intensive applications necessitates video coding solutions that balance high Rate-Distortion (RD) efficiency with ultra-low latency. This paper presents a longitudinal performance analysis of NVIDIA hardware encoding (NVENC), spanning from Pascal to the emerging Blackwell generation. We specifically evaluate the operational viability of the new "Ultra High Quality" (UHQ) tuning mode against standard low-latency configurations. Our results demonstrate that while the Blackwell architecture breaks historical efficiency plateaus, achieving a 5.94% BD-Rate gain in standard modes and up to 22.79% in UHQ modes, these gains incur severe system-level penalties. We reveal that UHQ operates as a hybrid pipeline, offloading complexity to CUDA cores and enforcing aggressive temporal structures (up to 7 B-frames) that increase end-to-end latency by over 400% and GPU board power consumption by up to 40%. Consequently, while UHQ successfully bridges the quality gap with software encoders, its prohibitive serialization delay renders it unsuitable for interactive real-time communications, positioning it instead as a specialized solution for Video-on-Demand (VoD) transcoding.
翻译:上行密集型应用的快速扩张需要能够在高率失真(RD)效率与超低延迟之间取得平衡的视频编码解决方案。本文对NVIDIA硬件编码(NVENC)进行了纵向性能分析,涵盖从Pascal到新兴Blackwell架构的世代演进。我们重点评估了新型"超高质量"(UHQ)调优模式相对于标准低延迟配置的操作可行性。结果表明,尽管Blackwell架构突破了历史效率瓶颈,在标准模式下实现了5.94%的BD-Rate增益,UHQ模式下更达到了22.79%,但这些增益以严重的系统级代价为代价。我们揭示UHQ本质上是混合流水线:将计算复杂度卸载至CUDA核心,并强制采用激进的时域结构(最多7个B帧),导致端到端延迟增加超过400%,GPU板级功耗提升高达40%。因此,尽管UHQ成功缩小了与软件编码器的质量差距,但其过高的串行化延迟使其不适用于交互式实时通信,更适合作为视频点播(VoD)转码的专用解决方案。