DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL

from arxiv, 3 pages, 1 figure, 1 artifact and reproducibility appendix. Accepted for presentation at EduHPC-24: Workshop on Education for High-Performance Computing, to be held during Supercomputing 2024 conference

We present an assignment for a full Parallel Computing course. Since 2017/2018, we have proposed a different problem each academic year to illustrate various methodologies for approaching the same computational problem using different parallel programming models. They are designed to be parallelized using shared-memory programming with OpenMP, distributed-memory programming with MPI, and GPU programming with CUDA or OpenCL. The problem chosen for this year implements a brute-force solution for exact DNA sequence alignment of multiple patterns. The program searches for exact coincidences of multiple nucleotide strings in a long DNA sequence. The sequential implementation is designed to be clear and understandable to students while offering many opportunities for parallelization and optimization. This assignment addresses key concepts many students find difficult to apply in practical scenarios: race conditions, reductions, collective operations, and point-to-point communications. It also covers the problem of parallel generation of pseudo-random sequences and strategies to notify and stop speculative computations when matches are found. This assignment serves as an exercise that reinforces basic knowledge and prepares students for more complex parallel computing concepts and structures. It has been successfully implemented as a practical assignment in a Parallel Computing course in the third year of a Computer Engineering degree program. Supporting materials for this and previous assignments in this series are publicly available.

翻译：本文介绍了一项面向完整并行计算课程的课程设计。自2017/2018学年起，我们每年提出一个不同的问题，用以阐释使用不同并行编程模型处理同一计算问题的多种方法。这些问题设计为可采用共享内存编程（OpenMP）、分布式内存编程（MPI）以及GPU编程（CUDA或OpenCL）进行并行化。本年度所选问题实现了针对多模式精确DNA序列比对的暴力求解算法。该程序在长DNA序列中搜索多个核苷酸字符串的精确匹配。其串行实现设计清晰易懂，同时为并行化与优化提供了充分空间。本设计重点针对许多学生在实际场景中难以应用的关键概念：竞态条件、归约操作、集体通信及点对点通信。此外，还涉及伪随机序列的并行生成问题，以及当发现匹配时通知并终止推测性计算的策略。该课程设计作为强化基础知识的实践练习，为学生掌握更复杂的并行计算概念与架构奠定基础。该设计已成功作为计算机工程专业三年级并行计算课程的实践作业实施。本系列当前及往年课程设计的辅助材料均已公开提供。