3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.
翻译:近年来,从文本提示生成三维内容已取得显著成功。然而,当前文生三维方法生成的三维结果常与人类偏好存在偏差。本文提出名为DreamReward的综合框架,旨在从人类偏好反馈中学习并改进文生三维模型。首先,我们通过包含评分与排序的系统化标注流程,收集了25,000组专家对比数据。随后构建Reward3D——首个通用文生三维人类偏好奖励模型,以有效编码人类偏好。基于该三维奖励模型,我们最终开展理论分析,提出Reward3D反馈学习(DreamFL)算法——一种通过重新定义评分器直接优化多视图扩散模型的调优方法。基于理论证明与大量实验对比,我们的DreamReward成功生成了高保真且三维一致的结果,在提示与人类意图的对齐度上取得显著提升。实验结果充分展示了利用人类反馈改进文生三维模型的巨大潜力。