The process of cyber mapping gives insights in relationships among financial entities and service providers. Centered around the outsourcing practices of companies within fund prospectuses in Germany, we introduce a dataset specifically designed for named entity recognition and relation extraction tasks. The labeling process on 948 sentences was carried out by three experts which yields to 5,969 annotations for four entity types (Outsourcing, Company, Location and Software) and 4,102 relation annotations (Outsourcing-Company, Company-Location). State-of-the-art deep learning models were trained to recognize entities and extract relations showing first promising results. An anonymized version of the dataset, along with guidelines and the code used for model training, are publicly available at https://www.dfki.uni-kl.de/cybermapping/data/CO-Fun-1.0-anonymized.zip.
翻译:网络映射过程可揭示金融实体与服务提供商之间的关联关系。围绕德国基金招股说明书中公司的外包实践,我们构建了一个专门面向命名实体识别与关系抽取任务的数据集。三位专家对948个句子进行了标注,共获得5,969个实体标注(涵盖外包、公司、地点和软件四类实体)及4,102个关系标注(包含外包-公司、公司-地点两类关系)。采用当前最先进的深度学习模型进行实体识别与关系抽取,初步取得了令人满意的结果。该数据集的匿名版本、标注指南及模型训练代码已公开于:https://www.dfki.uni-kl.de/cybermapping/data/CO-Fun-1.0-anonymized.zip。