Efficiently querying data on embedded sensor and IoT devices is challenging given the very limited memory and CPU resources. With the increasing volumes of collected data, it is critical to process, filter, and manipulate data on the edge devices where it is collected to improve efficiency and reduce network transmissions. Existing embedded index structures do not adapt to the data distribution and characteristics. This paper demonstrates how applying learned indexes that develop space efficient summaries of the data can dramatically improve the query performance and predictability. Learned indexes based on linear approximations can reduce the query I/O by 50 to 90% and improve query throughput by a factor of 2 to 5, while only requiring a few kilobytes of RAM. Experimental results on a variety of time series data sets demonstrate the advantages of learned indexes that considerably improve over the state-of-the-art index algorithms.
翻译:在嵌入式传感器和物联网设备上高效查询数据极具挑战性,因为这些设备的内存和CPU资源极为有限。随着收集数据量的不断增加,在采集数据的边缘设备上进行处理、过滤和操作以提高效率、减少网络传输变得至关重要。现有嵌入式索引结构无法适应数据分布和特征。本文展示了如何应用学习索引(通过构建空间高效的数据摘要)来显著提升查询性能和可预测性。基于线性近似的学习索引可将查询I/O减少50%至90%,并将查询吞吐量提升2至5倍,且仅需几KB的RAM。在多种时间序列数据集上的实验结果证明了学习索引的优势,其性能显著优于最先进的索引算法。