In an era where micro-videos dominate platforms like TikTok and YouTube, AI-generated content is nearing cinematic quality. The next frontier is using large language models (LLMs) to autonomously create viral micro-videos, a largely untapped potential that could shape the future of AI-driven content creation. To address this gap, this paper presents the first exploration of LLM-assisted popular micro-video generation (LLMPopcorn). We selected popcorn as the icon for this paper because it symbolizes leisure and entertainment, aligning with this study on leveraging LLMs as assistants for generating popular micro-videos that are often consumed during leisure time. Specifically, we empirically study the following research questions: (i) How can LLMs be effectively utilized to assist popular micro-video generation? (ii) To what extent can prompt-based enhancements optimize the LLM-generated content for higher popularity? (iii) How well do various LLMs and video generators perform in the popular micro-video generation task? Exploring these questions, we show that advanced LLMs like DeepSeek-V3 can generate micro-videos with popularity rivaling human content. Prompt enhancement further boosts results, while benchmarking highlights DeepSeek-V3 and R1 for LLMs, and LTX-Video and HunyuanVideo for video generation. This work advances AI-assisted micro-video creation and opens new research directions. The code is publicly available at https://github.com/GAIR-Lab/LLMPopcorn.
翻译:在微视频主导TikTok和YouTube等平台的时代,AI生成内容正逼近电影级质量。下一个前沿是利用大语言模型自主创作病毒式微视频,这一尚未充分开发的潜力可能塑造AI驱动内容创作的未来。为填补这一空白,本文首次探索了LLM辅助的流行微视频生成。我们选择爆米花作为本文的图标,因为它象征着休闲与娱乐,这与本研究利用LLM作为助手生成常在休闲时间消费的流行微视频的目标相一致。具体而言,我们通过实证研究探讨以下研究问题:(i) 如何有效利用LLM辅助流行微视频生成?(ii) 基于提示的增强能在多大程度上优化LLM生成内容以获得更高流行度?(iii) 不同LLM和视频生成器在流行微视频生成任务中表现如何?通过探索这些问题,我们证明像DeepSeek-V3这样的先进LLM能够生成流行度可与人类内容媲美的微视频。提示增强进一步提升了效果,而基准测试突显了DeepSeek-V3和R1在LLM方面的优势,以及LTX-Video和HunyuanVideo在视频生成方面的优势。这项工作推动了AI辅助微视频创作,并开辟了新的研究方向。代码公开于https://github.com/GAIR-Lab/LLMPopcorn。