Image Captioning (IC) models can highly benefit from human feedback in the training process, especially in cases where data is limited. We present work-in-progress on adapting an IC system to integrate human feedback, with the goal to make it easily adaptable to user-specific data. Our approach builds on a base IC model pre-trained on the MS COCO dataset, which generates captions for unseen images. The user will then be able to offer feedback on the image and the generated/predicted caption, which will be augmented to create additional training instances for the adaptation of the model. The additional instances are integrated into the model using step-wise updates, and a sparse memory replay component is used to avoid catastrophic forgetting. We hope that this approach, while leading to improved results, will also result in customizable IC models.
翻译:图像描述生成(IC)模型在训练过程中能从人类反馈中显著获益,尤其在数据有限的情况下。我们提出了一个将人类反馈整合进IC系统的研究进展,旨在使其能够轻松适应特定用户的数据。我们的方法基于在MS COCO数据集上预训练的基础IC模型,该模型可为未见过的图像生成描述。用户随后可以对图像及生成/预测的描述提供反馈,这些反馈将被增强以创建额外的训练实例,用于模型的适配。额外的实例通过逐步更新的方式整合进模型中,并使用稀疏记忆回放组件来避免灾难性遗忘。我们希望该方法不仅能带来改进的结果,还能生成可定制的IC模型。