Modern, large scale monitoring systems have to process and store vast amounts of log data in near real-time. At query time the systems have to find relevant logs based on the content of the log message using support structures that can scale to these amounts of data while still being efficient to use. We present our novel DynaWarp membership sketch, capable of answering Multi-Set Multi-Membership-Queries, that can be used as an alternative to existing indexing structures for streamed log data. In our experiments, DynaWarp required up to 93% less storage space than the tested state-of-the-art inverted index and had up to four orders of magnitude less false-positives than the tested state-of-the-art membership sketch. Additionally, DynaWarp achieved up to 250 times higher query throughput than the tested inverted index and up to 240 times higher query throughput than the tested membership sketch.
翻译:现代大规模监控系统需要以近实时方式处理和存储海量日志数据。在查询时,系统需基于日志消息内容,借助能够扩展至如此数据规模且保持高效使用的支撑结构,来定位相关日志。我们提出了一种新型DynaWarp成员关系概要图,能够回答多集合多成员关系查询,可作为流式日志数据现有索引结构的替代方案。实验中,DynaWarp比当前最先进的倒排索引节省高达93%的存储空间,且误报率比当前最先进的成员关系概要图低至四个数量级。此外,DynaWarp的查询吞吐量比测试的倒排索引高出250倍,比测试的成员关系概要图高出240倍。