Videos for mobile devices become the most popular access to share and acquire information recently. For the convenience of users' creation, in this paper, we present a system, namely MobileVidFactory, to automatically generate vertical mobile videos where users only need to give simple texts mainly. Our system consists of two parts: basic and customized generation. In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices. As for the audio, by retrieving from our big database, our system matches a suitable background sound for the video. Additionally to produce customized content, our system allows users to add specified screen texts to the video for enriching visual expression, and specify texts for automatic reading with optional voices as they like.
翻译:近年来,面向移动设备的视频已成为最流行的信息分享与获取方式。为方便用户创作,本文提出一个名为MobileVidFactory的系统,可自动生成竖屏移动视频,用户主要仅需提供简单文本。该系统包含基础生成与定制化生成两个模块。在基础生成中,我们利用预训练图像扩散模型,并将其适配为面向移动设备的高质量开放域竖屏视频生成器。针对音频方面,系统通过检索大型数据库,为视频匹配适宜的背景音乐。此外,为产生定制化内容,系统允许用户向视频添加指定屏幕文本以丰富视觉表达,并可根据用户偏好设置自动朗读文本的语音及可选音色。