Recent advances in diffusion models and parameter-efficient fine-tuning (PEFT) have made text-to-image generation and customization widely accessible, with Low Rank Adaptation (LoRA) able to replicate an artist's style or subject using minimal data and computation. In this paper, we examine the relationship between LoRA weights and artistic styles, demonstrating that LoRA weights alone can serve as an effective descriptor of style, without the need for additional image generation or knowledge of the original training set. Our findings show that LoRA weights yield better performance in clustering of artistic styles compared to traditional pre-trained features, such as CLIP and DINO, with strong structural similarities between LoRA-based and conventional image-based embeddings observed both qualitatively and quantitatively. We identify various retrieval scenarios for the growing collection of customized models and show that our approach enables more accurate retrieval in real-world settings where knowledge of the training images is unavailable and additional generation is required. We conclude with a discussion on potential future applications, such as zero-shot LoRA fine-tuning and model attribution.
翻译:近年来,扩散模型与参数高效微调(PEFT)技术的进展,使得文本到图像生成与定制变得广泛可用,其中低秩自适应(LoRA)能够以极少的数据和计算量复现艺术家的风格或主题。本文研究了LoRA权重与艺术风格之间的关系,证明仅凭LoRA权重本身即可作为风格的有效描述符,无需额外的图像生成或对原始训练集的了解。我们的研究结果表明,与传统预训练特征(如CLIP和DINO)相比,LoRA权重在艺术风格聚类任务中表现更优,并且基于LoRA的嵌入与基于传统图像的嵌入在结构和定量分析上均表现出显著的相似性。我们针对日益增长的自定义模型集合提出了多种检索场景,并证明在训练图像未知且需要额外生成的真实场景中,我们的方法能够实现更精确的检索。最后,我们探讨了未来潜在的应用方向,例如零样本LoRA微调和模型溯源。