Icicle: Scalable Metadata Indexing and Real-Time Monitoring for HPC File Systems

Haochen Pan,Ryan Chard,Song Young Oh,Maxime Gonthier,Valérie Hayot-Sasson,Geoffrey Lentner,Joe Bottigliero,Rachana Ananthakrishnan,Kyle Chard,Ian Foster

from arxiv, ISC High Performance 2026 research paper, camera-ready

Modern HPC file systems can contain billions of files and hundreds of petabytes of data, making even simple questions increasingly intractable to answer. Traditional file system utilities such as find and du fail to scale to these sizes. While external indexing tools like GUFI and Brindexer improve query performance, they remain batch-oriented and unsuitable for heterogeneous, rapidly evolving environments. We present Icicle, a scalable framework for continuous file system metadata indexing and monitoring. Icicle maintains a unified, up-to-date, and queryable view of file system state while supporting both periodic snapshot-based ingestion for bulk metadata updates and event-based ingestion for real-time synchronization from production systems such as Lustre and IBM Storage Scale. Built on Apache Kafka and Apache Flink, Icicle provides high-throughput, fault-tolerant, and horizontally scalable ingestion of metadata events into two complementary search indexes, enabling both individual file discovery and aggregate summary statistics by user, group, and directory. This architecture enables efficient support for both coarse-grained administrative queries and interactive analytics over billions of objects. Our experimental evaluation on production-scale HPC datasets demonstrates order-of-magnitude throughput improvements over existing monitoring and indexing approaches, with tunable options for balancing consistency, latency, and metadata freshness.

翻译：现代高性能计算文件系统可容纳数十亿文件与数百PB数据，使得即使简单的问题也变得日益难以解答。传统文件系统工具（如find和du）已无法扩展至此类规模。尽管GUFI和Brindexer等外部索引工具提升了查询性能，但其仍为批处理导向，不适用于异构且快速演进的环境。我们提出Icicle——一个用于持续文件系统元数据索引与监控的可扩展框架。Icicle维护统一的、最新且可查询的文件系统状态视图，同时支持基于周期性快照的批量元数据更新摄取，以及来自Lustre和IBM Storage Scale等生产系统的基于事件的实时同步摄取。该框架基于Apache Kafka和Apache Flink构建，能将元数据事件以高吞吐、容错且水平可扩展的方式摄入两个互补的搜索索引，从而支持单个文件发现与按用户、组、目录聚合的汇总统计。该架构可高效支持对数十亿对象的粗粒度管理查询与交互式分析。我们在生产级高性能计算数据集上的实验评估表明，相较于现有监控与索引方法，本方法可实现数量级的吞吐量提升，并提供可调选项以平衡一致性、延迟与元数据时效性。