Microservices are commonly used in modern cloud-native applications to achieve agility. However, the complexity of service dependencies in large-scale microservices systems can lead to anomaly propagation, making fault troubleshooting a challenge. To address this issue, distributed tracing systems have been proposed to trace complete request execution paths, enabling developers to troubleshoot anomalous services. However, existing distributed tracing systems have limitations such as invasive instrumentation, trace loss, or inaccurate trace correlation. To overcome these limitations, we propose a new tracing system based on eBPF (extended Berkeley Packet Filter), named Nahida, that can track complete requests in the kernel without intrusion, regardless of programming language or implementation. Our evaluation results show that Nahida can track over 92% of requests with stable accuracy, even under the high concurrency of user requests, while the state-of-the-art non-invasive approaches can not track any of the requests. Importantly, Nahida can track requests served by a multi-threaded application that none of the existing invasive tracing systems can handle by instrumenting tracing codes into libraries. Moreover, the overhead introduced by Nahida is negligible, increasing service latency by only 1.55%-2.1%. Overall, Nahida provides an effective and non-invasive solution for distributed tracing.
翻译:微服务在现代云原生应用中广泛使用以实现敏捷性。然而,大规模微服务系统中服务依赖的复杂性可能导致异常传播,使得故障排查成为一项挑战。为解决此问题,分布式追踪系统被提出以追踪完整的请求执行路径,使开发者能够排查异常服务。然而,现有分布式追踪系统存在侵入式插桩、追踪丢失或追踪关联不准确等局限性。为克服这些局限,我们提出一种基于eBPF(扩展伯克利数据包过滤器)的新型追踪系统,命名为Nahida,它能够在内核中无侵入地追踪完整请求,且不受编程语言或实现方式限制。评估结果表明,即使在高并发用户请求下,Nahida也能以稳定准确性追踪超过92%的请求,而现有最先进的无侵入方法无法追踪任何请求。重要的是,Nahida能够追踪由多线程应用处理的请求,这是现有任何侵入式追踪系统通过向库中插桩追踪代码都无法处理的。此外,Nahida引入的开销可忽略不计,仅增加1.55%-2.1%的服务延迟。总体而言,Nahida为分布式追踪提供了一种有效且无侵入的解决方案。