Distributed traces contain valuable information but are often massive in volume, posing a core challenge in tracing framework design: balancing the tradeoff between preserving essential trace information and reducing trace volume. To address this tradeoff, previous approaches typically used a '1 or 0' sampling strategy: retaining sampled traces while completely discarding unsampled ones. However, based on an empirical study on real-world production traces, we discover that the '1 or 0' strategy actually fails to effectively balance this tradeoff. To achieve a more balanced outcome, we shift the strategy from the '1 or 0' paradigm to the 'commonality + variability' paradigm. The core of 'commonality + variability' paradigm is to first parse traces into common patterns and variable parameters, then aggregate the patterns and filter the parameters. We propose a cost-efficient tracing framework, Mint, which implements the 'commonality + variability' paradigm on the agent side to enable all requests capturing. Our experiments show that Mint can capture all traces and retain more trace information while optimizing trace storage (reduced to an average of 2.7%) and network overhead (reduced to an average of 4.2%). Moreover, experiments also demonstrate that Mint is lightweight enough for production use.
翻译:分布式追踪数据蕴含宝贵信息,但其体量往往极为庞大,这构成了追踪框架设计的核心挑战:如何在保留关键追踪信息与降低追踪数据量之间取得平衡。为解决这一权衡问题,先前方法通常采用“1或0”采样策略:保留被采样的追踪数据,同时完全丢弃未被采样的数据。然而,基于对实际生产环境追踪数据的实证研究,我们发现“1或0”策略实际上未能有效平衡这一权衡。为实现更优的平衡效果,我们将策略从“1或0”范式转向“共性+可变性”范式。该范式的核心在于先将追踪数据解析为共性模式与可变参数,进而聚合模式并筛选参数。我们提出了一种高效能追踪框架Mint,该框架在代理端实现“共性+可变性”范式以支持全请求捕获。实验表明,Mint能够捕获全部追踪数据并在保留更多追踪信息的同时,优化追踪存储(平均降至2.7%)与网络开销(平均降至4.2%)。此外,实验也证明Mint足够轻量,适用于生产环境部署。