Fine-tuning facilitates the adaptation of text-to-image generative models to novel concepts (e.g., styles and portraits), empowering users to forge creatively customized content. Recent efforts on fine-tuning focus on reducing training data and lightening computation overload but neglect alignment with user intentions, particularly in manual curation of multi-modal training data and intent-oriented evaluation. Informed by a formative study with fine-tuning practitioners for comprehending user intentions, we propose IntentTuner, an interactive framework that intelligently incorporates human intentions throughout each phase of the fine-tuning workflow. IntentTuner enables users to articulate training intentions with imagery exemplars and textual descriptions, automatically converting them into effective data augmentation strategies. Furthermore, IntentTuner introduces novel metrics to measure user intent alignment, allowing intent-aware monitoring and evaluation of model training. Application exemplars and user studies demonstrate that IntentTuner streamlines fine-tuning, reducing cognitive effort and yielding superior models compared to the common baseline tool.
翻译:微调技术有助于将文本到图像生成模型适配至新颖概念(如风格与人像),赋予用户创作个性化定制内容的能力。近期微调研究主要聚焦于减少训练数据量与降低计算负载,却忽视了与用户意图的对齐,特别是在多模态训练数据的人工筛选及意图导向评估方面。通过一项面向微调实践者的形成性研究以理解用户意图,我们提出IntentTuner交互框架,该框架在微调流程各阶段智能化地融入人类意图。IntentTuner支持用户通过图像示例与文本描述阐述训练意图,并自动将其转化为有效的数据增强策略。此外,该框架引入新颖的度量标准以评估用户意图对齐度,实现对模型训练的意图感知监控与评估。应用实例与用户研究表明,相较于常规基准工具,IntentTuner能简化微调流程、降低认知负荷,并生成更优模型。