The cloud computing paradigm underlines data center and telecommunication infrastructure design. Heavily leveraging virtualization, it slices hardware and software resources into smaller software units for greater flexibility of manipulation. Given the considerable benefits, several virtualization forms, with varying processing and communication overheads, emerged, including Full Virtualization and OS Virtualization. As a result, predicting packet throughput at the data plane turns out to be more challenging due to the additional virtualization overhead located at CPU, I/O, and network resources. This research presents a dataset of active network measurements data collected while varying various network parameters, including CPU affinity, frequency of echo packet injection, type of virtual network driver, use of CPU, I/O, or network load, and the number of concurrent VMs. The virtualization technologies used in the study include KVM, LXC, and Docker. The work examines their impact on a key network metric, namely, end-to-end latency. Also, it builds data models to evaluate the impact of a cloud computing environment on packet round-trip time. To explore data visualization, the dataset was submitted to pre-processing, correlation analysis, dimensionality reduction, and clustering. In addition, this paper provides a brief analysis of the dataset, demonstrating its use in developing machine learning-based systems for administrator decision-making.
翻译:云计算范式对数据中心和电信基础设施的设计具有重要影响。它深度利用虚拟化技术,将硬件和软件资源切分为更小的软件单元,以实现更高的操作灵活性。鉴于其显著优势,出现了多种具有不同处理和通信开销的虚拟化形式,包括全虚拟化和操作系统虚拟化。由于在CPU、I/O和网络资源上存在额外的虚拟化开销,预测数据平面的数据包吞吐量因此变得更具挑战性。本研究提供了一个主动网络测量数据集,该数据是在改变多种网络参数(包括CPU亲和性、回显数据包注入频率、虚拟网络驱动类型、CPU/I/O或网络负载的使用情况以及并发虚拟机数量)的过程中收集的。研究中使用的虚拟化技术包括KVM、LXC和Docker。本工作考察了这些技术对关键网络指标——即端到端延迟——的影响。同时,本研究构建了数据模型,以评估云计算环境对数据包往返时间的影响。为探索数据可视化,该数据集经过了预处理、相关性分析、降维和聚类处理。此外,本文对数据集进行了简要分析,展示了其在开发基于机器学习的系统以辅助管理员决策方面的应用价值。