A scalable and reliable system is required to analyze the National Health and Nutrition Examination Survey (NHANES) data efficiently to understand hospital utilization risk factors. This study aims to investigate the integration of continuous integration and deployment (CI/CD) practices in data science workflows, specifically focusing on analyzing NHANES data to identify the prevalence of diabetes, obesity, and cardiovascular diseases. An end-to-end cloud-based DevOps framework is proposed for data analysis which examines risk factors associated with hospital utilization and evaluates key hospital utilization metrics. We have also highlighted the modular structure of the framework that can be generalized for any other domains beyond healthcare. In the framework, an online data update method is provided which can be extended further using both real and synthetic data. As such, the framework can be especially useful for sparse dataset domains such as environmental science, robotics, cybersecurity, and cultural heritage and arts.
翻译:需要一种可扩展且可靠的系统来高效分析国家健康与营养调查(NHANES)数据,以理解医院利用的风险因素。本研究旨在探讨持续集成与持续部署(CI/CD)实践在数据科学工作流中的整合,特别侧重于分析NHANES数据以识别糖尿病、肥胖症和心血管疾病的患病率。我们提出了一个端到端的基于云的DevOps框架用于数据分析,该框架检查与医院利用相关的风险因素并评估关键的医院利用指标。我们还强调了该框架的模块化结构,可推广至医疗保健以外的任何其他领域。在该框架中,提供了一种在线数据更新方法,可通过真实数据和合成数据进一步扩展。因此,该框架对于环境科学、机器人技术、网络安全以及文化遗产与艺术等数据稀疏领域尤其有用。