In the investment industry, it is often essential to carry out fine-grained company similarity quantification for a range of purposes, including market mapping, competitor analysis, and mergers and acquisitions. We propose and publish a knowledge graph, named CompanyKG, to represent and learn diverse company features and relations. Specifically, 1.17 million companies are represented as nodes enriched with company description embeddings; and 15 different inter-company relations result in 51.06 million weighted edges. To enable a comprehensive assessment of methods for company similarity quantification, we have devised and compiled three evaluation tasks with annotated test sets: similarity prediction, competitor retrieval and similarity ranking. We present extensive benchmarking results for 11 reproducible predictive methods categorized into three groups: node-only, edge-only, and node+edge. To the best of our knowledge, CompanyKG is the first large-scale heterogeneous graph dataset originating from a real-world investment platform, tailored for quantifying inter-company similarity.
翻译:在投资行业中,为满足市场映射、竞争对手分析及并购交易等多种需求,通常需要对公司间相似度进行精细化量化。我们提出并发布了一个名为CompanyKG的知识图谱,用于表征和学习多样化的公司特征及其关联关系。具体而言,该图谱包含117万家作为节点的公司实体,每个节点均通过公司描述嵌入进行信息增强;同时,15种不同类型的公司间关系生成了5106万条加权边。为便于对公司相似度量化方法进行综合评估,我们设计并构建了包含标注测试集的三个评估任务:相似度预测、竞争对手检索和相似度排序。针对11种可复现的预测方法,我们进行了广泛的基准测试,这些方法可分为三类:仅节点、仅边以及节点+边混合模型。据我们所知,CompanyKG是首个源自真实投资平台、专用于公司间相似度量化的超大规模异构图数据集。