Named Entity Recognition (NER) has emerged as a critical component in automating financial transaction processing, particularly in extracting structured information from unstructured payment data. This paper presents a comprehensive analysis of state-of-the-art NER algorithms specifically designed for payment data extraction, including Conditional Random Fields (CRF), Bidirectional Long Short-Term Memory with CRF (BiLSTM-CRF), and transformer-based models such as BERT and FinBERT. We conduct extensive experiments on a dataset of 50,000 annotated payment transactions across multiple payment formats including SWIFT MT103, ISO 20022, and domestic payment systems. Our experimental results demonstrate that fine-tuned BERT models achieve an F1-score of 94.2% for entity extraction, outperforming traditional CRF-based approaches by 12.8 percentage points. Furthermore, we introduce PaymentBERT, a novel hybrid architecture combining domain-specific financial embeddings with contextual representations, achieving state-of-the-art performance with 95.7% F1-score while maintaining real-time processing capabilities. We provide detailed analysis of cross-format generalization, ablation studies, and deployment considerations. This research provides practical insights for financial institutions implementing automated sanctions screening, anti-money laundering (AML) compliance, and payment processing systems.
翻译:命名实体识别(NER)已成为金融交易处理自动化的关键组成部分,特别是在从非结构化支付数据中提取结构化信息方面。本文针对专为支付数据提取设计的最先进NER算法进行了全面分析,包括条件随机场(CRF)、双向长短期记忆网络与CRF的组合模型(BiLSTM-CRF),以及基于Transformer的模型如BERT和FinBERT。我们在包含SWIFT MT103、ISO 20022及国内支付系统等多种支付格式的50,000条标注支付交易数据集上进行了大量实验。实验结果表明,经过微调的BERT模型在实体提取任务中取得了94.2%的F1分数,较传统基于CRF的方法提升了12.8个百分点。此外,我们提出了PaymentBERT——一种结合领域特定金融嵌入与上下文表征的新型混合架构,该模型在保持实时处理能力的同时,以95.7%的F1分数实现了最先进的性能表现。我们对跨格式泛化能力、消融实验及部署考量进行了详细分析。本研究为金融机构实施自动化制裁名单筛查、反洗钱(AML)合规及支付处理系统提供了实践指导。