Autonomous systems (ASes) play diverse roles in today's Internet, from community and research backbones to hyperscale content providers and submarine-cable operators. However, existing taxonomies based solely on network-level features fail to capture their semantic and operational heterogeneity. In this paper, we present Linnaeus, a hierarchical AS-classification framework that combines network-centric data (e.g., topology, BGP announcements) with rich non-network features and leverages domain-adapted large language models alongside traditional machine-learning techniques. Linnaeus provides a two-level taxonomy with 18 top-level and 38 second-level classes, supports multi-label assignments to reflect hybrid roles (e.g., research backbone and transit provider), and provides an end-to-end pipeline from data ingestion to label inference. On a manually annotated dataset of nearly 2,000 ASes, Linnaeus achieves an overall precision and recall of 0.83 and 0.76, respectively. We further demonstrate its practical value through case studies, highlighting Linnaeus's potential to reveal both structural and semantic dimensions of Internet infrastructure.
翻译:自治系统(AS)在当今互联网中扮演着多样化的角色,从社区和研究骨干网到超大规模内容提供商和海缆运营商。然而,仅基于网络层面特征的现有分类法无法捕捉其语义和运营层面的异质性。本文提出Linnaeus,一种层次化的AS分类框架,它结合了以网络为中心的数据(如拓扑结构、BGP通告)与丰富的非网络特征,并利用领域自适应的大型语言模型及传统机器学习技术。Linnaeus提供了一个包含18个顶层类别和38个二级类别的两级分类体系,支持多标签分配以反映混合角色(例如研究骨干网与传输提供商),并提供了从数据采集到标签推断的端到端流程。在一个包含近2000个AS的手动标注数据集上,Linnaeus实现了总体精确率0.83和召回率0.76。我们通过案例研究进一步展示了其实用价值,凸显了Linnaeus在揭示互联网基础设施结构与语义维度的潜力。