The use of large-scale supercomputing architectures is a hard requirement for scientific computing Big-Data applications. An example is genomics analytics, where millions of data transformations and tests per patient need to be done to find relevant clinical indicators. Therefore, to ensure open and broad access to high-performance technologies, governments, and academia are pushing toward the introduction of novel computing architectures in large-scale scientific environments. This is the case of RISC-V, an open-source and royalty-free instruction-set architecture. To evaluate such technologies, here we present the Variant-Interaction Analytics use case benchmarking suite and datasets. Through this use case, we search for possible genetic interactions using computational and statistical methods, providing a representative case for heavy ETL (Extract, Transform, Load) data processing. Current implementations are implemented in x86-based supercomputers (e.g. MareNostrum-IV at the Barcelona Supercomputing Center (BSC)), and future steps propose RISC-V as part of the next MareNostrum generations. Here we describe the Variant Interaction Use Case, highlighting the characteristics leveraging high-performance computing, indicating the caveats and challenges towards the next RISC-V developments and designs to come from a first comparison between x86 and RISC-V architectures on real Variant Interaction executions over real hardware implementations.
翻译:大规模超级计算架构是科学计算大数据应用的硬性需求。以基因组分析为例,为发现相关临床指标,需对每位患者实施数百万次数据转换与检测。因此,为确保高性能技术的开放与广泛可及性,各国政府及学术界正积极推动在大型科学环境中引入新型计算架构,RISC-V(一种开源且免版税的指令集架构)正是此类代表。为评估该技术,本文提出变异-交互分析用例基准测试套件及数据集。通过该用例,我们运用计算与统计方法搜索可能的遗传交互,为繁重的ETL(提取、转换、加载)数据处理提供了典型案例。当前实现基于x86架构的超级计算机(如巴塞罗那超级计算中心的MareNostrum-IV),而未来规划拟将RISC-V纳入下一代MareNostrum系统。本文详细描述变异交互用例,重点阐述其发挥高性能计算优势的特性,并基于x86与RISC-V架构在实际硬件上执行真实变异交互任务的首轮对比,指出RISC-V未来开发与设计需关注的注意事项及挑战。