Hybrid storage systems (HSS) combine multiple storage devices with diverse characteristics to achieve high performance and capacity at low cost. The performance of an HSS highly depends on the effectiveness of two key policies: (1) the data-placement policy, which determines the best-fit storage device for incoming data, and (2) the data-migration policy, which rearranges stored data across the devices to sustain high HSS performance. Prior works focus on improving only data placement or only data migration in HSS, which leads to sub-optimal HSS performance. Unfortunately, no prior work tries to optimize both policies together. Our goal is to design a holistic data-management technique for HSS that optimizes both data-placement and data-migration policies to fully exploit the potential of an HSS. We propose Harmonia, a multi-agent reinforcement learning (RL)-based data-management technique that employs two light-weight autonomous RL agents, a data-placement agent and a data-migration agent, which adapt their policies for the current workload and HSS configuration, and coordinate with each other to improve overall HSS performance. We evaluate Harmonia on a real HSS with up to four heterogeneous storage devices with diverse characteristics. Our evaluation using 17 data-intensive workloads on performance-optimized (cost-optimized) HSS with two storage devices shows that, on average, Harmonia (1) outperforms the best-performing prior approach by 49.5% (31.7%), (2) bridges the performance gap between the best-performing prior work and Oracle by 64.2% (64.3%). On an HSS with three (four) devices, Harmonia outperforms the best-performing prior work by 37.0% (42.0%). Harmonia's performance benefits come with low latency (240ns for inference) and storage overheads (206 KiB for both RL agents together). We plan to open-source Harmonia's implementation to aid future research on HSS.
翻译:混合存储系统(HSS)通过结合多种具有不同特性的存储设备,以低成本实现高性能与大容量。HSS的性能高度依赖于两个关键策略的有效性:(1)数据放置策略,用于确定传入数据最匹配的存储设备;(2)数据迁移策略,通过在设备间重新组织已存储数据以维持HSS的高性能。先前的研究仅专注于改进HSS中的数据放置或数据迁移,这导致HSS性能未能达到最优。遗憾的是,尚无研究尝试同时优化这两种策略。我们的目标是设计一种面向HSS的整体数据管理技术,通过同时优化数据放置与数据迁移策略,以充分发挥HSS的潜力。我们提出Harmonia,一种基于多智能体强化学习(RL)的数据管理技术,它部署了两个轻量级自主RL智能体:一个数据放置智能体与一个数据迁移智能体。这两个智能体能针对当前工作负载和HSS配置自适应调整其策略,并通过相互协调来提升HSS的整体性能。我们在一个包含最多四个异构特性存储设备的真实HSS上对Harmonia进行了评估。在使用17个数据密集型工作负载、针对双设备性能优化(成本优化)HSS的评估中,Harmonia平均(1)性能优于先前最佳方法49.5%(31.7%),(2)将先前最佳工作与Oracle之间的性能差距缩小了64.2%(64.3%)。在三设备(四设备)HSS上,Harmonia的性能优于先前最佳工作37.0%(42.0%)。Harmonia的性能优势伴随着低延迟(推理耗时240纳秒)与低存储开销(两个RL智能体总计206 KiB)。我们计划开源Harmonia的实现,以助力未来对HSS的研究。