The OpenAIRE graph contains a large citation graph dataset, with over 200 million publications and over 2 billion citations. The current graph is available as a dump with metadata which uncompressed totals ~TB. This makes it hard to process on conventional computers. To make this network more available for the community we provide a processed OpenAIRE graph which is downscaled to 32GB, while preserving the full graph structure. Apart from this we offer the processed data in very simple format, which allows further straightforward manipulation. We also provide a python pipeline, which can be used to process the next releases of the OpenAIRE graph.
翻译:OpenAIRE图包含一个大规模的引文图数据集,涵盖超过2亿篇出版物和超过20亿条引用关系。当前该图以元数据转储文件形式提供,解压后总容量约达TB级别,这使得在常规计算机上处理变得困难。为使该网络更便于学术界使用,我们提供了经过处理的OpenAIRE图,其规模缩减至32GB,同时完整保留了图结构。此外,我们以极简格式提供处理后的数据,便于后续直接操作。我们还提供了Python处理流程,可用于处理OpenAIRE图的后续版本。