Neural enhancement through super-resolution deep neural networks opens up new possibilities for ultra-high-definition live streaming over existing encoding and networking infrastructure. Yet, the heavy SR DNN inference overhead leads to severe deployment challenges. To reduce the overhead, existing systems propose to apply DNN-based SR only on selected anchor frames while upscaling non-anchor frames via the lightweight reusing-based SR approach. However, frame-level scheduling is coarse-grained and fails to deliver optimal efficiency. In this work, we propose Palantir, the first neural-enhanced UHD live streaming system with fine-grained patch-level scheduling. In the presented solutions, two novel techniques are incorporated to make good scheduling decisions for inference overhead optimization and reduce the scheduling latency. Firstly, under the guidance of our pioneering and theoretical analysis, Palantir constructs a directed acyclic graph (DAG) for lightweight yet accurate quality estimation under any possible anchor patch set. Secondly, to further optimize the scheduling latency, Palantir improves parallelizability by refactoring the computation subprocedure of the estimation process into a sparse matrix-matrix multiplication operation. The evaluation results suggest that Palantir incurs a negligible scheduling latency accounting for less than 5.7% of the end-to-end latency requirement. When compared to the state-of-the-art real-time frame-level scheduling strategy, Palantir reduces the energy overhead of SR-integrated mobile clients by 38.1% at most (and 22.4% on average) and the monetary costs of cloud-based SR by 80.1% at most (and 38.4% on average).
翻译:基于超分辨率深度神经网络的神经增强技术,为在现有编码和网络基础设施上实现超高清直播流提供了新的可能性。然而,沉重的超分辨率深度神经网络推理开销带来了严峻的部署挑战。为降低开销,现有系统提出仅对选定的关键帧应用基于深度神经网络的超分辨率处理,而对非关键帧则通过轻量级的基于重用的超分辨率方法进行上采样。然而,帧级调度粒度较粗,无法实现最优效率。本文提出Palantir,首个采用细粒度块级调度的神经增强超高清直播流系统。在所提出的解决方案中,融入了两项新颖技术,以做出良好的调度决策来优化推理开销,并降低调度延迟。首先,在我们开创性理论分析的指导下,Palantir构建了一个有向无环图,用于在任何可能的关键块集合下进行轻量级且准确的质量估计。其次,为进一步优化调度延迟,Palantir通过将估计过程中的计算子过程重构为稀疏矩阵-矩阵乘法操作,提高了并行性。评估结果表明,Palantir引入的调度延迟可忽略不计,仅占端到端延迟要求的5.7%以下。与最先进的实时帧级调度策略相比,Palantir最多可将集成超分辨率的移动客户端的能耗开销降低38.1%(平均降低22.4%),并将基于云的超分辨率的货币成本最多降低80.1%(平均降低38.4%)。