Video streaming often requires transcoding content into different resolutions and bitrates to match the recipient's internet speed and screen capabilities. Video encoders like x264 offer various presets, each with different tradeoffs between transcoding time and rate-distortion performance. Choosing the best preset for video transcoding is difficult, especially for live streaming, as trying all the presets and choosing the best one is not feasible. One solution is to predict each preset's transcoding time and select the preset that ensures the highest quality while adhering to live streaming time constraints. Prediction of video transcoding time is also critical in minimizing streaming delays, deploying resource management algorithms, and load balancing. We propose a learning-based framework for predicting the transcoding time of videos across various presets. Our predictor's features for video transcoding time prediction are derived directly from the ingested stream, primarily from the header or metadata. As a result, only minimal additional delay is incurred for feature extraction, rendering our approach ideal for live-streaming applications. We evaluated our learning-based transcoding time prediction using a dataset of videos. The results demonstrate that our framework can accurately predict the transcoding time for different presets, with a mean absolute percentage error (MAPE) of nearly 5.0%. Leveraging these predictions, we then select the most suitable transcoding preset for live video streaming. Utilizing our transcoding time prediction-based preset selection improved Peak Signal-to-Noise Ratio (PSNR) of up to 5 dB.
翻译:实时视频流通常需要将内容转码为不同分辨率和比特率,以匹配接收端的网速和屏幕能力。诸如x264等视频编码器提供了多种预设选项,每种预设均在转码时间与率失真性能之间做出不同权衡。为视频转码选择最佳预设十分困难,尤其在实时流媒体场景中——逐一尝试所有预设并选取最优方案并不可行。一种解决方案是预测每个预设的转码时间,并在满足实时流媒体时间约束的前提下,选择能确保最高质量的预设。视频转码时间的预测对于最小化流媒体延迟、部署资源管理算法及负载均衡同样至关重要。我们提出了一种基于学习的框架,用于预测不同预设下的视频转码时间。该预测器提取的视频转码时间特征直接源自输入流,主要来自文件头或元数据。因此,特征提取仅引入极小的额外延迟,使该方法成为实时流媒体应用的理想选择。我们使用视频数据集对所提出的基于学习的转码时间预测方法进行了评估。结果表明,该框架能准确预测不同预设的转码时间,平均绝对百分比误差(MAPE)接近5.0%。基于这些预测,我们进一步为实时视频流选择最合适的转码预设。采用基于转码时间预测的预设选择方法,可使峰值信噪比(PSNR)提升高达5分贝。