While 3D generation is progressing rapidly, recent work has often focused on obtaining high-resolution assets, leaving user experience and deployability as afterthoughts. We present AssetGen, a 3D generator that focuses instead on these two aspects. Given one reference image, in 30 seconds it produces a high-quality mesh with baked normals, a color texture, and a controlled polygon budget suitable for real-time rendering, including mobile use cases. The AssetGen Flash variant further reduces latency to 14 seconds for interactive and agentic creation loops. Our model generates the object geometry with a coarse-to-refine VecSet framework, which implements mesh simplification, cleaning, and normal baking on the GPU, and a fast parallel UV unwrapping. It then generates textures in a multi-view fashion, followed by backprojection and 3D inpainting. Model distillation, kernel optimization, and pipeline parallelization are co-designed to accelerate the system end-to-end. We introduce numerous automated and blind human evaluations and demonstrate competitive visual quality against leading commercial solutions in 30 seconds and preview-quality results in less than 15 seconds. The final result is a system that supports AI-assisted, deployable 3D content creation in interactive workflows.
翻译:摘要:尽管3D生成领域发展迅速,但近期工作常聚焦于高分辨率资产获取,将用户体验与可部署性置于次要地位。我们提出AssetGen——一种聚焦于这两方面的3D生成器。仅需一张参考图像,便能在30秒内生成一个高质量网格,附带烘焙法线贴图、彩色纹理及受控的多边形预算,适用于实时渲染场景(包括移动端用例)。其变体AssetGen Flash进一步将延迟降至14秒,可支持交互式及智能体创作循环。该模型采用粗到精的VecSet框架生成物体几何,通过GPU实现网格简化、清理及法线烘焙,并配备快速并行UV展开。随后以多视角方式生成纹理,再经反投影与3D修复处理。我们协同设计了模型蒸馏、内核优化与流水线并行化,以加速端到端系统。通过引入大量自动化及盲测人工评估,证明本方法在30秒内可匹敌主流商业解决方案的视觉质量,且能在15秒内输出预览级质量。最终成果是一套支持AI辅助、可部署的交互式3D内容创作系统。