Progress in Type 1 Diabetes (T1D) algorithm development is limited by the fragmentation and lack of standardization across existing T1D management datasets. Current datasets differ substantially in structure and are time-consuming to access and process, which impedes data integration and reduces the comparability and generalizability of algorithmic developments. This work aims to establish a unified and accessible data resource for T1D algorithm development. Multiple publicly available T1D datasets were consolidated into a unified resource, termed the MetaboNet dataset. Inclusion required the availability of both continuous glucose monitoring (CGM) data and corresponding insulin pump dosing records. Additionally, auxiliary information such as reported carbohydrate intake and physical activity was retained when present. The MetaboNet dataset comprises 3135 subjects and 1228 patient-years of overlapping CGM and insulin data, making it substantially larger than existing standalone benchmark datasets. The resource is distributed as a fully public subset available for immediate download at https://metabo-net.org/ , and with a Data Use Agreement (DUA)-restricted subset accessible through their respective application processes. For the datasets in the latter subset, processing pipelines are provided to automatically convert the data into the standardized MetaboNet format. A consolidated public dataset for T1D research is presented, and the access pathways for both its unrestricted and DUA-governed components are described. The resulting dataset covers a broad range of glycemic profiles and demographics and thus can yield more generalizable algorithmic performance than individual datasets.
翻译:1型糖尿病(T1D)算法开发的进展受到现有T1D管理数据集分散且缺乏标准化的限制。当前数据集在结构上差异显著,访问和处理耗时,这阻碍了数据整合,并降低了算法开发的可比性和泛化性。本研究旨在为T1D算法开发建立一个统一且可访问的数据资源。多个公开可用的T1D数据集被整合为一个统一资源,称为MetaboNet数据集。纳入标准要求同时具备连续血糖监测(CGM)数据和相应的胰岛素泵给药记录。此外,如报告的碳水化合物摄入量和身体活动等辅助信息在可用时均予以保留。MetaboNet数据集包含3135名受试者及1228患者年的重叠CGM与胰岛素数据,使其规模显著大于现有的独立基准数据集。该资源以完全公开的子集形式分发,可通过https://metabo-net.org/ 立即下载,另有一个受数据使用协议(DUA)限制的子集需通过相应申请流程访问。对于后者包含的数据集,提供了处理流程以自动将数据转换为标准化的MetaboNet格式。本文提出了一个用于T1D研究的整合公共数据集,并描述了其无限制组件和DUA管辖组件的访问途径。所得数据集涵盖了广泛的血糖谱和人口统计学特征,因此能够产生比单个数据集更具泛化性的算法性能。