Monitoring software systems at runtime is key for understanding workloads, debugging, and self-adaptation. It typically involves collecting and storing observable software data, which can be analyzed online or offline. Despite the usefulness of collecting system data, it may significantly impact the system execution by delaying response times and competing with system resources. The typical approach to cope with this is to filter portions of the system to be monitored and to sample data. Although these approaches are a step towards achieving a desired trade-off between the amount of collected information and the impact on the system performance, they focus on collecting data of a particular type or may capture a sample that does not correspond to the actual system behavior. In response, we propose an adaptive runtime monitoring process to dynamically adapt the sampling rate while monitoring software systems. It includes algorithms with statistical foundations to improve the representativeness of collected samples without compromising the system performance. Our evaluation targets five applications of a widely used benchmark. It shows that the error (RMSE) of the samples collected with our approach is 9-54% lower than the main alternative strategy (sampling rate inversely proportional to the throughput), with 1-6% higher performance impact.
翻译:在运行时对软件系统进行监控是理解工作负载、调试和自适应优化的关键。这通常涉及收集和存储可观察的软件数据,供在线或离线分析。尽管收集系统数据具有实用性,但可能因延迟响应时间并与系统资源竞争而显著影响系统执行。常见的应对策略是过滤待监控的系统部分并对数据进行采样。尽管这些方法在实现采集信息量与系统性能影响之间的理想权衡方面迈出了重要一步,但它们侧重于收集特定类型的数据,或可能捕获到不代表实际系统行为的样本。为此,我们提出了一种自适应运行时监控过程,可在监控软件系统时动态调整采样率。该方法包含基于统计学基础的算法,旨在提升采集样本的代表性而不损害系统性能。我们的评估针对广泛使用的基准测试中的五个应用展开。结果表明,采用本方法采集的样本误差(RMSE)比主要替代策略(采样率与吞吐量成反比)低9%-54%,而性能影响仅高出1%-6%。