In most High Performance Computing (HPC) projects nowadays, there is a lot of data obtained from different sources, depending on the project's objectives. Some of that data is very huge in terms of size, so copying such data sometimes is an unrealistic goal. On the other hand, science requires data used for different purposes to remain unaltered, so different groups of researchers can reproduce results, discuss theories, and validate each other. In this paper, we present a novel approach to help research groups to validate data integrity on such distributed repositories using Blockchain. Originally developed for cryptographic currencies, Blockchain has demonstrated a versatile range of uses. Our proposal ensures 1) secure access to data management, 2) easy validation of data integrity, and 3) an easy way to add new records to the dataset with the same robust integrity policy. A prototype was developed and tested using a subset of a public dataset from a real scientific collaboration, the Latin American Giant Observatory (LAGO) Project.
翻译:在当今大多数高性能计算(HPC)项目中,根据项目目标,会从不同来源获取大量数据。其中部分数据体量极为庞大,复制此类数据有时是不切实际的目标。另一方面,科学要求用于不同目的的数据保持不可篡改,以便不同研究团队能够复现结果、讨论理论并相互验证。本文提出一种创新方法,利用区块链帮助研究团队验证此类分布式存储库中的数据完整性。区块链最初为加密货币而开发,现已展现出广泛的应用潜力。我们的方案确保:1)数据管理的安全访问;2)数据完整性的便捷验证;3)以同样稳健的完整性策略向数据集添加新记录的简易途径。我们使用来自真实科学合作项目——拉丁美洲巨型天文台(LAGO)项目的公共数据子集,开发并测试了原型系统。