In order to fully benefit from cloud computing, services are designed following the "multi-tenant" architectural model, which is aimed at maximizing resource sharing among users. However, multi-tenancy introduces challenges of security, performance isolation, scaling, and customization. RStudio server is an open-source Integrated Development Environment (IDE) accessible over a web browser for the R programming language. We present the design and implementation of a multi-user distributed system on Hopsworks, a data-intensive AI platform, following the multi-tenant model that provides RStudio as Software as a Service (SaaS). We use the most popular cloud-native technologies: Docker and Kubernetes, to solve the problems of performance isolation, security, and scaling that are present in a multi-tenant environment. We further enable secure data sharing in RStudio server instances to provide data privacy and allow collaboration among RStudio users. We integrate our system with Apache Spark, which can scale and handle Big Data processing workloads. Also, we provide a UI where users can provide custom configurations and have full control of their own RStudio server instances. Our system was tested on a Google Cloud Platform cluster with four worker nodes, each with 30GB of RAM allocated to them. The tests on this cluster showed that 44 RStudio servers, each with 2GB of RAM, can be run concurrently. Our system can scale out to potentially support hundreds of concurrently running RStudio servers by adding more resources (CPUs and RAM) to the cluster or system.
翻译:为了充分利用云计算的优势,服务遵循"多租户"架构模式设计,该模式旨在最大化用户间的资源共享。然而,多租户带来了安全、性能隔离、扩展和定制化方面的挑战。RStudio服务器是一款基于Web浏览器访问R编程语言的开源集成开发环境(IDE)。我们提出并实现了一个基于Hopsworks(数据密集型AI平台)的多用户分布式系统,遵循多租户模型将RStudio作为软件即服务(SaaS)提供。我们采用最流行的云原生技术——Docker和Kubernetes——来解决多租户环境中存在的性能隔离、安全性和扩展性问题。同时,我们在RStudio服务器实例中实现安全数据共享,以保障数据隐私并支持RStudio用户间的协作。我们将系统与Apache Spark集成,使其能够扩展并处理大数据工作负载。此外,我们提供用户界面,使用户能够自定义配置并完全控制自己的RStudio服务器实例。该系统在Google Cloud Platform集群上进行了测试,该集群包含四个工作节点,每个节点分配30GB内存。测试表明,该集群可同时运行44个RStudio服务器(每个服务器分配2GB内存)。通过向集群或系统添加更多资源(CPU和内存),我们的系统能够横向扩展,潜在支持数百个同时运行的RStudio服务器。