In this paper, we describe a federated compute platform dedicated to support Artificial Intelligence in scientific workloads. Putting the effort into reproducible deployments, it delivers consistent, transparent access to a federation of physically distributed e-Infrastructures. Through a comprehensive service catalogue, the platform is able to offer an integrated user experience covering the full Machine Learning lifecycle, including model development (with dedicated interactive development environments), training (with GPU resources, annotation tools, experiment tracking, and federated learning support) and deployment (covering a wide range of deployment options all along the Cloud Continuum). The platform also provides tools for traceability and reproducibility of AI models, integrates with different Artificial Intelligence model providers, datasets and storage resources, allowing users to interact with the broader Machine Learning ecosystem. Finally, it is easily customizable to lower the adoption barrier by external communities.
翻译:本文描述了一个专门用于支持科学计算负载中人工智能的联邦计算平台。该平台致力于实现可复现的部署,为物理上分布式的电子基础设施联盟提供一致、透明的访问。通过一个全面的服务目录,该平台能够提供覆盖完整机器学习生命周期的集成用户体验,包括模型开发(配备专用的交互式开发环境)、训练(提供GPU资源、标注工具、实验跟踪和联邦学习支持)以及部署(涵盖云连续体上的广泛部署选项)。该平台还提供了用于AI模型可追溯性和可复现性的工具,集成了不同的人工智能模型提供商、数据集和存储资源,使用户能够与更广泛的机器学习生态系统进行交互。最后,该平台易于定制,以降低外部社区的采用门槛。