Intense bandwidth depletion within consumer and constrained networks has the potential to undermine the stability of real-time video conferencing: encoder rate management becomes saturated, packet loss escalates, frame rates deteriorate, and end-to-end latency significantly increases. This work delineates an adaptive conferencing system that integrates WebRTC media delivery with a supplementary audio-driven talking-head reconstruction pathway and telemetry-driven mode regulation. The system consists of a WebSocket signaling service, an optional SFU for multi-party transmission, a browser client capable of real-time WebRTC statistics extraction and CSV telemetry export, and an AI REST service that processes a reference face image and recorded audio to produce a synthesized MP4; the browser can substitute its outbound camera track with the synthesized stream with a median bandwidth of 32.80 kbps. The solution incorporates a bandwidth-mode switching strategy and a client-side mode-state logger.
翻译:在消费级和受限网络中,严重的带宽消耗可能破坏实时视频会议的稳定性:编码器码率管理趋于饱和、数据包丢失加剧、帧率下降,且端到端延迟显著增加。本研究阐述了一种自适应会议系统,该系统将WebRTC媒体传输与辅助的音频驱动说话头部重建路径及遥测驱动的模式调控机制相结合。系统包含WebSocket信令服务、用于多方传输的可选SFU、能够实时提取WebRTC统计数据并导出CSV遥测数据的浏览器客户端,以及一个通过处理参考人脸图像和录制音频来生成合成MP4的AI REST服务;浏览器可用该合成流(中位带宽32.80 kbps)替代其出站摄像头轨道。该方案融合了带宽模式切换策略与客户端模式状态记录器。