Dynamic and Super-Personalized Media Ecosystem Driven by Generative AI: Unpredictable Plays Never Repeating The Same

This paper introduces a media service model that exploits artificial intelligence (AI) video generators at the receive end. This proposal deviates from the traditional multimedia ecosystem, completely relying on in-house production, by shifting part of the content creation onto the receiver. We bring a semantic process into the framework, allowing the distribution network to provide service elements that prompt the content generator, rather than distributing encoded data of fully finished programs. The service elements include fine-tailored text descriptions, lightweight image data of some objects, or application programming interfaces, comprehensively referred to as semantic sources, and the user terminal translates the received semantic data into video frames. Empowered by the random nature of generative AI, the users could then experience super-personalized services accordingly. The proposed idea incorporates the situations in which the user receives different service providers' element packages; a sequence of packages over time, or multiple packages at the same time. Given promised in-context coherence and content integrity, the combinatory dynamics will amplify the service diversity, allowing the users to always chance upon new experiences. This work particularly aims at short-form videos and advertisements, which the users would easily feel fatigued by seeing the same frame sequence every time. In those use cases, the content provider's role will be recast as scripting semantic sources, transformed from a thorough producer. Overall, this work explores a new form of media ecosystem facilitated by receiver-embedded generative models, featuring both random content dynamics and enhanced delivery efficiency simultaneously.

翻译：本文提出了一种在接收端利用人工智能（AI）视频生成器的媒体服务模型。该方案偏离了完全依赖内部制作的传统多媒体生态系统，将部分内容创作转移至接收端。我们在框架中引入语义处理流程，使分发网络能够提供提示内容生成器的服务元素，而非分发完整成品节目的编码数据。这些服务元素包括精细定制的文本描述、部分物体的轻量级图像数据或应用程序编程接口，统称为语义源，用户终端将接收到的语义数据转换为视频帧。借助生成式人工智能的随机特性，用户可据此体验超个性化服务。该设想涵盖了用户接收不同服务提供商元素包的情景：既可以是时间序列上的连续包，也可以是同时接收的多个包。在保证上下文连贯性与内容完整性的前提下，组合动态特性将放大服务多样性，使用户总能偶遇全新体验。本工作特别针对短视频与广告场景——用户因每次看到相同帧序列而容易产生审美疲劳。在这些用例中，内容提供商角色将从全面制作者转变为语义源脚本编写者。总体而言，本文探索了由嵌入式接收端生成模型赋能的新型媒体生态系统，该生态同时具备随机内容动态与增强传输效率的双重特性。