Regulated industries, such as Healthcare and Finance, are starting to move parts of their data and workloads to the public cloud. However, they are still reluctant to trust the public cloud with their most sensitive records, and hence leave them in their premises, leveraging the hybrid cloud architecture. We address the security and performance challenges of big data analytics using a hybrid cloud in a real-life use case from a hospital. In this use case, the hospital collects sensitive patient data and wants to run analytics on it in order to lower antibiotics resistance, a significant challenge in healthcare. We show that it is possible to run large-scale analytics on data that is securely stored in the public cloud encrypted using Apache Parquet Modular Encryption (PME), without significant performance losses even if the secret encryption keys are stored on-premises. PME is a standard mechanism for data encryption and key management, not specific to any public cloud, and therefore helps prevent vendor lock-in. It also provides privacy and integrity guarantees, and enables granular access control to the data. We also present an innovation in PME for lowering the performance hit incurred by calls to the Key Management Service. Our solution therefore enables protecting large amounts of sensitive data in hybrid clouds and still allows to efficiently gain valuable insights from it.
翻译:受监管行业(如医疗保健和金融)正开始将部分数据和工作负载迁移至公共云。然而,这些行业仍不愿将最敏感记录托付给公共云,因此依托混合云架构将其保留在本地。我们基于医院真实场景,探讨了利用混合云进行大数据分析时面临的安全与性能挑战。在该场景中,医院收集敏感患者数据并希望通过分析降低抗生素耐药性——这一医疗领域的重大挑战。研究表明,即使将加密密钥存储在本地,仍可对使用Apache Parquet模块化加密(PME)安全存储在公共云中的加密数据进行大规模分析,且不会造成显著性能损失。PME是一种标准化的数据加密与密钥管理机制,不绑定任何特定公共云,有助于避免供应商锁定。该机制同时提供隐私性与完整性保障,并支持对数据的细粒度访问控制。我们还提出了PME的创新方案,以降低密钥管理服务调用引发的性能开销。因此,我们的解决方案既能保护混合云中大量敏感数据,又能高效从中获取有价值的洞察。