We propose a novel perspective of viewing large pretrained models as search engines, thereby enabling the repurposing of techniques previously used to enhance search engine performance. As an illustration, we employ a personalized query rewriting technique in the realm of text-to-image generation. Despite significant progress in the field, it is still challenging to create personalized visual representations that align closely with the desires and preferences of individual users. This process requires users to articulate their ideas in words that are both comprehensible to the models and accurately capture their vision, posing difficulties for many users. In this paper, we tackle this challenge by leveraging historical user interactions with the system to enhance user prompts. We propose a novel approach that involves rewriting user prompts based a new large-scale text-to-image dataset with over 300k prompts from 3115 users. Our rewriting model enhances the expressiveness and alignment of user prompts with their intended visual outputs. Experimental results demonstrate the superiority of our methods over baseline approaches, as evidenced in our new offline evaluation method and online tests. Our approach opens up exciting possibilities of applying more search engine techniques to build truly personalized large pretrained models.
翻译:我们提出了一种新颖的视角,将大型预训练模型视为搜索引擎,从而能够重新利用先前用于提升搜索引擎性能的技术。作为示例,我们在文生图生成领域应用了个性化查询重写技术。尽管该领域取得了显著进展,但创建与用户个体偏好和需求紧密对齐的个性化视觉表征仍具有挑战性。这一过程要求用户用模型可理解且准确捕捉其愿景的语言表达想法,这对许多用户而言存在困难。本文通过利用用户与系统的历史交互来增强用户提示,从而应对这一挑战。我们提出了一种新方法,基于一个包含来自3115名用户超过30万条提示的大规模文生图数据集,对用户提示进行重写。我们的重写模型提升了用户提示的表现力及其与预期视觉输出的一致性。实验结果表明,我们提出的方法在新型离线评估方法和在线测试中均优于基线方法。该工作为应用更多搜索引擎技术构建真正个性化的预训练大模型开辟了令人兴奋的可能性。