This paper presents the Kafka Slurm Agent (KSA), an open source (Apache 2.0 license) distributed computing and stream processing engine designed to help researchers distribute Python-based computational tasks across multiple Slurm-managed HPC clusters and workstations. Written entirely in Python, this extensible framework utilizes an Apache Kafka broker for asynchronous communication between its components. It is intended for non-expert users and does not require administrative privileges or additional libraries to run on Slurm. The framework's development was driven by the introduction of the AlphaFold protein structure prediction model, specifically, it was first created to facilitate the detection of knots in protein chains within structures predicted by AlphaFold. KSA has since been applied to several structural bioinformatics research projects, among others, leading to the discovery of new knotted proteins with previously unknown knot types. These knotted structures are now part of the AlphaKnot 2.0 web server and database, where KSA is applied to manage the knot detection process for user-uploaded structures.
翻译:本文介绍Kafka Slurm Agent(KSA),这是一个基于Apache 2.0许可证的开源分布式计算与流处理引擎,旨在帮助研究者在多个Slurm管理的高性能计算集群和工作站上分发基于Python的计算任务。该可扩展框架完全采用Python编写,利用Apache Kafka代理实现组件间的异步通信。它面向非专业用户设计,在Slurm环境中运行既不需要管理员权限,也无需安装额外依赖库。该框架的开发源于AlphaFold蛋白质结构预测模型的引入——具体而言,其最初创建目的是为了促进对AlphaFold预测结构中蛋白质链内纽结的检测。KSA目前已应用于多个结构生物信息学研究项目,并由此发现了具有前所未见纽结类型的新型纽结蛋白质。这些纽结结构现已纳入AlphaKnot 2.0网络服务器及数据库,其中KSA被用于管理用户上传结构的纽结检测流程。