NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA

Marlon Tobaben,Mohamed Ali Souibgui,Rubèn Tito,Khanh Nguyen,Raouf Kerkouche,Kangsoo Jung,Joonas Jälkö,Lei Kang,Andrey Barsky,Vincent Poulain d'Andecy,Aurélie Joseph,Aashiq Muhamed,Kevin Kuo,Virginia Smith,Yusuke Yamasaki,Takumi Fukami,Kenta Niwa,Iifan Tyou,Hiro Ishii,Rio Yokota,Ragul N,Rintu Kutum,Josep Llados,Ernest Valveny,Antti Honkela,Mario Fritz,Dimosthenis Karatzas

from arxiv, 27 pages, 6 figures

The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. The competition introduced a dataset of real invoice documents, along with associated questions and answers requiring information extraction and reasoning over the document images. Thereby, it brings together researchers and expertise from the document analysis, privacy, and federated learning communities. Participants fine-tuned a pre-trained, state-of-the-art Document Visual Question Answering model provided by the organizers for this new domain, mimicking a typical federated invoice processing setup. The base model is a multi-modal generative language model, and sensitive information could be exposed through either the visual or textual input modality. Participants proposed elegant solutions to reduce communication costs while maintaining a minimum utility threshold in track 1 and to protect all information from each document provider using differential privacy in track 2. The competition served as a new testbed for developing and testing private federated learning methods, simultaneously raising awareness about privacy within the document image analysis and recognition community. Ultimately, the competition analysis provides best practices and recommendations for successfully running privacy-focused federated learning challenges in the future.

翻译：隐私保护联邦学习文档视觉问答（PFL-DocVQA）竞赛旨在推动社区为现实应用场景——发票处理——开发可证明具有隐私保护性且通信高效的联邦学习解决方案。本次竞赛引入了一个真实发票文档数据集，包含需要从文档图像中提取信息并进行推理的相关问题与答案，从而汇聚了来自文档分析、隐私保护与联邦学习领域的研究者与专业知识。参赛者基于组织方提供的预训练、最先进的文档视觉问答模型，针对这一新领域进行微调，模拟典型的联邦发票处理设置。该基础模型是一个多模态生成式语言模型，敏感信息可能通过视觉或文本输入模态暴露。参赛者在赛道1中提出了降低通信成本同时保持最低效用阈值的优雅方案，在赛道2中则利用差分隐私技术保护来自每个文档提供方的所有信息。本竞赛为开发与测试隐私保护联邦学习方法提供了新的测试平台，同时提升了文档图像分析与识别领域对隐私问题的关注。最终，竞赛分析为未来成功举办以隐私为核心的联邦学习挑战赛提供了最佳实践与建议。