Live migration of an application or VM is a well-known technique for load balancing, performance optimization, and resource management. To minimize the total downtime during migration, two popular methods -- pre-copy or post-copy -- are used in practice. These methods scale to large VMs and applications since the downtime is independent of the memory footprint of an application. However, in a secure, trusted execution environment (TEE) like Intel's scalable SGX, the state-of-the-art still uses the decade-old stop-and-copy method, where the total downtime is proportional to the application's memory footprint. This is primarily due to the fact that TEEs like Intel SGX do not expose memory and page table accesses to the OS, quite unlike unsecure applications. However, with modern TEE solutions that efficiently support large applications, such as Intel's Scalable SGX and AMD's Epyc, it is high time that TEE migration methods also evolve to enable live migration of large TEE applications with minimal downtime (stop-and-copy cannot be used any more). We present OptMig, an end-to-end solution for live migrating large memory footprints in TEE-enabled applications. Our approach does not require a developer to modify the application; however, we need a short, separate compilation pass and specialized software library support. Our optimizations reduce the total downtime by 98% for a representative microbenchmark that uses 20GB of secure memory and by 90 -- 96% for a suite of Intel SGX applications that have multi-GB memory footprints.
翻译:应用程序或虚拟机的实时迁移是一种用于负载均衡、性能优化和资源管理的成熟技术。为最小化迁移过程中的总停机时间,实践中通常采用两种主流方法——预拷贝(pre-copy)或后拷贝(post-copy)。由于停机时间与应用的内存占用无关,这些方法可扩展至大型虚拟机及应用程序。然而,在诸如英特尔可扩展SGX这类安全可信执行环境中,业界仍沿用已有十余年历史的"停止-拷贝"方法,其总停机时间与应用内存占用量成正比。这主要源于英特尔SGX等TEE环境(与不安全的应用程序不同)不向操作系统暴露内存及页表访问信息。但随着现代TEE方案(如英特尔可扩展SGX与AMD Epyc)对大型应用的高效支持,TEE迁移方法亟需同步演进,以实现大型TEE应用在最低停机时间下的实时迁移("停止-拷贝"已不再适用)。我们提出OptMig这一端到端解决方案,用于实时迁移支持TEE的大型内存占用量应用。该方案无需开发者修改应用程序,但需要经过一个简短的独立编译环节并配备专用软件库支持。我们的优化方案将使用20GB安全内存的代表性微基准测试的总停机时间降低98%,并将多GB内存占用的英特尔SGX应用套件的总停机时间降低90%-96%。