With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that need to be resolved. In this survey, we embark from the perspective of disassembling Sora in text-to-video generation, and conducting a comprehensive review of literature, trying to answer the question, \textit{From Sora What We Can See}. Specifically, after basic preliminaries regarding the general algorithms are introduced, the literature is categorized from three mutually perpendicular dimensions: evolutionary generators, excellent pursuit, and realistic panorama. Subsequently, the widely used datasets and metrics are organized in detail. Last but more importantly, we identify several challenges and open problems in this domain and propose potential future directions for research and development.
翻译:人工智能在取得显著成就后,正沿着通往通用人工智能的道路前进。OpenAI开发的Sora具备分钟级世界模拟能力,可被视为这条发展道路上的里程碑。然而,尽管取得了显著成功,Sora仍然面临各种有待解决的障碍。本次综述从拆解Sora在文本到视频生成中的视角出发,对相关文献进行全面回顾,试图回答"从Sora我们能看见什么"这一问题。具体而言,在介绍通用算法的基本预备知识后,文献从三个相互垂直的维度进行分类:演化生成器、卓越追求与现实全景。随后,详细整理了广泛使用的数据集和评估指标。最后但更重要的是,我们指出了该领域中的若干挑战和开放性问题,并提出了潜在的研究与开发未来方向。