Collaborative Chinese Text Recognition with Personalized Federated Learning

In Chinese text recognition, to compensate for the insufficient local data and improve the performance of local few-shot character recognition, it is often necessary for one organization to collect a large amount of data from similar organizations. However, due to the natural presence of private information in text data, such as addresses and phone numbers, different organizations are unwilling to share private data. Therefore, it becomes increasingly important to design a privacy-preserving collaborative training framework for the Chinese text recognition task. In this paper, we introduce personalized federated learning (pFL) into the Chinese text recognition task and propose the pFedCR algorithm, which significantly improves the model performance of each client (organization) without sharing private data. Specifically, pFedCR comprises two stages: multiple rounds of global model training stage and the the local personalization stage. During stage 1, an attention mechanism is incorporated into the CRNN model to adapt to various client data distributions. Leveraging inherent character data characteristics, a balanced dataset is created on the server to mitigate character imbalance. In the personalization phase, the global model is fine-tuned for one epoch to create a local model. Parameter averaging between local and global models combines personalized and global feature extraction capabilities. Finally, we fine-tune only the attention layers to enhance its focus on local personalized features. The experimental results on three real-world industrial scenario datasets show that the pFedCR algorithm can improve the performance of local personalized models by about 20\% while also improving their generalization performance on other client data domains. Compared to other state-of-the-art personalized federated learning methods, pFedCR improves performance by 6\% $\sim$ 8\%.

翻译：在中文文本识别中，为弥补本地数据不足并提升本地小样本字符识别性能，单一机构往往需要从同类机构收集大量数据。然而，由于文本数据天然包含地址、电话号码等隐私信息，不同机构不愿共享私有数据。因此，设计一种保护隐私的中文文本识别协作训练框架愈发重要。本文首次将个性化联邦学习（pFL）引入中文文本识别任务，提出pFedCR算法，在不共享私有数据的前提下显著提升各客户端（机构）的模型性能。具体而言，pFedCR包含两阶段：多轮全局模型训练阶段与本地个性化阶段。阶段1中，我们在CRNN模型中集成注意力机制以适应不同客户端数据分布；同时利用字符数据固有特性，在服务器端构建均衡数据集以缓解字符不平衡问题。在个性化阶段，全局模型经单轮微调生成本地模型，通过本地与全局模型的参数平均融合个性化特征提取能力与全局特征提取能力。最后，仅对注意力层进行微调以增强其对本地个性化特征的关注。在三个真实工业场景数据集上的实验结果表明，pFedCR算法可使本地个性化模型性能提升约20%，同时提升其在其他客户端数据域的泛化性能。与现有最优个性化联邦学习方法相比，pFedCR的性能提升幅度达6%∼8%。