Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose a hybrid variational autoencoder to learn interpretable representations of CGM and meal data. Our method grounds the latent space to the inputs of a mechanistic differential equation, producing embeddings that reflect physiological quantities, such as insulin sensitivity, glucose effectiveness, and basal glucose levels. Moreover, we introduce a novel method to infer the glucose appearance rate, making the mechanistic model robust to unreliable meal logs. On a dataset of CGM and self-reported meals from individuals with type-2 diabetes and pre-diabetes, our unsupervised representation discovers a separation between individuals proportional to their disease severity. Our embeddings produce clusters that are up to 4x better than naive, expert, black-box, and pure mechanistic features. Our method provides a nuanced, yet interpretable, embedding space to compare glycemic control within and across individuals, directly learnable from in-the-wild data.
翻译:糖尿病涵盖了个体间差异显著的血糖控制复杂情况。然而,当前方法无法在用餐水平上准确捕捉这种变异。一方面,专家设计的特征缺乏数据驱动方法的灵活性;另一方面,学习到的表征往往难以解释,这阻碍了临床应用的推广。本文提出一种混合变分自编码器,用于学习连续血糖监测(CGM)与用餐数据的可解释表征。该方法将潜在空间锚定至机制微分方程的输入,生成反映胰岛素敏感性、葡萄糖有效性及基础血糖水平等生理量的嵌入表示。此外,我们引入了一种推断葡萄糖出现速率的新方法,使机制模型对不可靠的用餐记录具有鲁棒性。在包含2型糖尿病及前驱糖尿病患者CGM数据和自我报告用餐记录的数据集上,我们的无监督表征发现了与疾病严重程度成比例的个体间分布差异。生成的嵌入聚类效果比朴素方法、专家特征、黑箱模型及纯机制特征优达4倍。本文提出的方法可在野生数据中直接学习,提供一种既精细又具有可解释性的嵌入空间,用于比较个体内部及个体间的血糖控制情况。