In response to the limitations of manual online ad production, significant research has been conducted in the field of automatic ad text generation (ATG). However, comparing different methods has been challenging because of the lack of benchmarks encompassing the entire field and the absence of well-defined problem sets with clear model inputs and outputs. To address these challenges, this paper aims to advance the field of ATG by introducing a redesigned task and constructing a benchmark. Specifically, we defined ATG as a cross-application task encompassing various aspects of the Internet advertising. As part of our contribution, we propose a first benchmark dataset, CA Multimodal Evaluation for Ad Text GeneRAtion (CAMERA), carefully designed for ATG to be able to leverage multi-modal information and conduct an industry-wise evaluation. Furthermore, we demonstrate the usefulness of our proposed benchmark through evaluation experiments using multiple baseline models, which vary in terms of the type of pre-trained language model used and the incorporation of multi-modal information. We also discuss the current state of the task and the future challenges.
翻译:针对人工在线广告生产的局限性,自动广告文本生成(ATG)领域已开展了大量研究。然而,由于缺乏涵盖整个领域的基准测试,且尚无明确界定模型输入与输出的清晰问题集,不同方法的比较一直面临挑战。为解决这些问题,本文通过重新设计任务并构建基准测试,旨在推动ATG领域发展。具体而言,我们将ATG定义为涵盖互联网广告各环节的跨应用任务。作为贡献之一,我们提出了首个基准数据集——广告文本生成多模态评估(CAMERA),该数据集专为ATG设计,能够利用多模态信息并开展行业级评估。此外,我们通过采用不同类型预训练语言模型及融合多模态信息的多个基线模型进行实验,论证了所提基准的有效性。最后,本文探讨了该任务的当前进展与未来挑战。