Active measurements can be used to collect server characteristics on a large scale. This kind of metadata can help discovering hidden relations and commonalities among server deployments offering new possibilities to cluster and classify them. As an example, identifying a previously-unknown cybercriminal infrastructures can be a valuable source for cyber-threat intelligence. We propose herein an active measurement-based methodology for acquiring Transport Layer Security (TLS) metadata from servers and leverage it for their fingerprinting. Our fingerprints capture the characteristic behavior of the TLS stack primarily caused by the implementation, configuration, and hardware support of the underlying server. Using an empirical optimization strategy that maximizes information gain from every handshake to minimize measurement costs, we generated 10 general-purpose Client Hellos used as scanning probes to create a large database of TLS configurations used for classifying servers. We fingerprinted 28 million servers from the Alexa and Majestic toplists and two Command and Control (C2) blocklists over a period of 30 weeks with weekly snapshots as foundation for two long-term case studies: classification of Content Delivery Network and C2 servers. The proposed methodology shows a precision of more than 99 % and enables a stable identification of new servers over time. This study describes a new opportunity for active measurements to provide valuable insights into the Internet that can be used in security-relevant use cases.
翻译:主动测量可用于大规模收集服务器特征。这类元数据能够揭示服务器部署间隐藏的关联性与共性,为集群化分类提供新可能。例如,识别先前未知的网络犯罪基础设施可成为网络威胁情报的重要来源。本文提出一种基于主动测量的方法,用于获取服务器的传输层安全(TLS)元数据,并据此实现指纹识别。我们的指纹特征主要捕捉由底层服务器的实现方式、配置与硬件支持所导致的TLS协议栈特性行为。通过采用经验性优化策略(每次握手均最大化信息增益以降低测量成本),我们生成了10种通用型Client Hello探测载荷,构建了用于服务器分类的大型TLS配置数据库。在为期30周内,我们以周快照为基准,对Alexa与Majestic顶级域名列表及两个命令与控制(C2)黑名单中的2800万台服务器进行了指纹采集,并以此作为两项长期案例研究的基础:内容分发网络(CDN)与C2服务器分类。所提方法精度超过99%,可实现对新增服务器的稳定持续识别。本研究揭示了主动测量为互联网提供安全相关洞察的新途径。