io_uring moves I/O submission and completion into shared-memory rings. This makes it fast, and it also makes it invisible. strace sees only the ring setup, and the kernel tracepoints that expose the request flow are not stable ABI, so the few tools built on them work only on narrow kernel ranges. We present uringscope, a single-binary, language-agnostic observability tool for io_uring built on CO-RE (Compile Once, Run Everywhere) eBPF. uringscope makes four contributions. The first is a precise model of the request lifecycle and a method to reconstruct per-request flows from kernel events. The second is a technique for attaching portably to an unstable tracepoint surface, using BTF-probed program variants, CO-RE field flavors, and position-independent reads. The third is an evaluation of the tradeoff between overhead and fidelity: on device-bound NVMe workloads uringscope's aggregate mode costs 0.7 to 9.9% of throughput, which is cheaper than every full-fidelity alternative we measured. The fourth is a lightweight correctness mode that reuses the same reconstruction to detect submission-boundary hazards, together with a built-in doctor that turns the measurements into named pathologies with evidence, for operators who are debugging a tail-latency incident rather than browsing histograms.
翻译:io_uring将I/O提交与完成操作迁移至共享内存环形缓冲区,这一设计在实现高性能的同时也带来了不可见性问题。strace仅能观测到环形缓冲区的初始化过程,而暴露请求流的内核跟踪点并非稳定ABI,这使得基于这些跟踪点构建的少数工具仅能在有限的内核版本范围内工作。我们提出了uringscope——一个基于CO-RE(一次编译,随处运行)eBPF技术构建的单一二进制、语言无关的io_uring可观测性工具。uringscope具有四项创新:第一,精确的请求生命周期模型及从内核事件重建单请求流的方法;第二,通过BTF探测程序变体、CO-RE字段变体及位置无关读取技术,实现与不稳定跟踪点表面的可移植挂载;第三,开销与保真度之间的权衡评估:在设备绑定的NVMe工作负载下,uringscope的聚合模式仅造成0.7%至9.9%的吞吐量损耗,其成本低于我们测量的所有全保真替代方案;第四,轻量级正确性模式,利用相同的重建机制检测提交边界风险,并内置诊断模块将测量结果转化为具名病理现象及证据,供操作人员在调试尾延迟事件(而非浏览直方图)时使用。