We present Threadle, an open-source, high-performance, and memory-efficient network storage and query engine written in C#. Designed for working with full-population networks derived from administrative register data, which represent very large, multilayer, mixed-mode networks with millions of nodes and billions of edges, Threadle addresses a fundamental limitation of existing network libraries: the inability to efficiently handle two-mode (bipartite) data at scale. Threadle's core innovation is a pseudo-projection approach that allows two-mode layers to be queried as if they were projected into one-mode form, without ever materializing the memory-prohibitive projection. We demonstrate that a network with 20 million nodes containing layers equivalent to 8 trillion projected edges can be stored in approximately 20 GB of RAM -- a compression ratio exceeding 2000:1 compared to materialized projection. Additionally, Threadle provides native support for multilayer mixed-mode networks, an integrated node attribute manager, and a CLI frontend with 50+ commands for the construction, processing, file handling, and management of very large heterogeneous networks. Threadle is freely available at https://www.threadle.dev and can either be obtained as precompiled binaries for Win, macOS and Linux, or compiled directly from source. Supplementing Threadle is threadleR, an R frontend that enables advanced sampling- and traversal-based analyses on very large, heterogeneous, multilayer, mixed-mode population-scale networks.
翻译:本文介绍Threadle,一个基于C#开发的开源、高性能、内存高效网络存储与查询引擎。该系统专为处理源自行政登记数据的全量人口网络而设计,此类网络通常包含数百万节点与数十亿边,具有大规模、多层及混合模式的特征。Threadle解决了现有网络库的一个根本性局限:无法高效处理大规模双模(二分)数据。其核心创新在于采用一种伪投影方法,使得双模网络层能够以单模投影形式进行查询,而无需实际构建内存消耗巨大的物理投影。实验表明,一个包含2000万节点、其网络层等价于8万亿投影边的网络,仅需约20GB内存即可存储——相较于实体化投影,压缩比超过2000:1。此外,Threadle原生支持多层混合模式网络,集成节点属性管理器,并提供包含50余个命令的CLI前端,支持超大规模异构网络的构建、处理、文件操作及管理。Threadle可通过https://www.threadle.dev 免费获取,提供适用于Windows、macOS与Linux的预编译二进制文件,也支持直接从源码编译。配套工具threadleR作为R语言前端,支持对超大规模、异构、多层、混合模式的人口尺度网络进行基于抽样与遍历的高级分析。