This study aims to explore efficient tuning methods for the screenshot captioning task. Recently, image captioning has seen significant advancements, but research in captioning tasks for mobile screens remains relatively scarce. Current datasets and use cases describing user behaviors within product screenshots are notably limited. Consequently, we sought to fine-tune pre-existing models for the screenshot captioning task. However, fine-tuning large pre-trained models can be resource-intensive, requiring considerable time, computational power, and storage due to the vast number of parameters in image captioning models. To tackle this challenge, this study proposes a combination of adapter methods, which necessitates tuning only the additional modules on the model. These methods are originally designed for vision or language tasks, and our intention is to apply them to address similar challenges in screenshot captioning. By freezing the parameters of the image caption models and training only the weights associated with the methods, performance comparable to fine-tuning the entire model can be achieved, while significantly reducing the number of parameters. This study represents the first comprehensive investigation into the effectiveness of combining adapters within the context of the screenshot captioning task. Through our experiments and analyses, this study aims to provide valuable insights into the application of adapters in vision-language models and contribute to the development of efficient tuning techniques for the screenshot captioning task. Our study is available at https://github.com/RainYuGG/BLIP-Adapter
翻译:本研究旨在探索屏幕截图描述任务的高效调优方法。近年来,图像描述技术取得了显著进展,但针对移动端屏幕的描述任务研究仍相对匮乏。描述产品截图中用户行为的现有数据集与用例尤为有限。为此,我们尝试对预训练模型进行微调以适配屏幕截图描述任务。然而,由于图像描述模型参数量庞大,微调大型预训练模型需要消耗大量时间、算力与存储资源。针对这一挑战,本研究提出一种适配器方法组合方案,仅需对模型中的新增模块进行调优。这些方法最初专为视觉或语言任务设计,我们将其应用于解决屏幕截图描述中的类似问题。通过冻结图像描述模型的参数并仅训练与方法相关的权重,可在显著减少参数量的同时保持与全模型微调相当的性能。本研究首次系统探究了适配器组合在屏幕截图描述任务中的有效性。通过实验与分析,本研究旨在为适配器在视觉-语言模型中的应用提供重要洞见,并为屏幕截图描述任务的高效调优技术开发做出贡献。本研究成果请参阅 https://github.com/RainYuGG/BLIP-Adapter