Many biological processes display oscillatory behavior based on an approximately 24 hour internal timing system specific to each individual. One process of particular interest is gene expression, for which several circadian transcriptomic studies have identified associations between gene expression during a 24 hour period and an individual's health. A challenge with analyzing data from these studies is that each individual's internal timing system is offset relative to the 24 hour day-night cycle, where day-night cycle time is recorded for each collected sample. Laboratory procedures can accurately determine each individual's offset and determine the internal time of sample collection. However, these laboratory procedures are labor-intensive and expensive. In this paper, we propose a corrected score function framework to obtain a regression model of gene expression given internal time when the offset of each individual is too burdensome to determine. A feature of this framework is that it does not require the probability distribution generating offsets to be symmetric with a mean of zero. Simulation studies validate the use of this corrected score function framework for cosinor regression, which is prevalent in circadian transcriptomic studies. Illustrations with three real circadian transcriptomic data sets further demonstrate that the proposed framework consistently mitigates bias relative to using a score function that does not account for this offset.
翻译:许多生物过程表现出基于每个个体特有的约24小时内部计时系统的振荡行为。其中一个特别受关注的过程是基因表达。多项昼夜节律转录组学研究已发现,24小时周期内的基因表达与个体健康之间存在关联。分析这些研究数据时面临的一个挑战是,每个个体的内部计时系统相对于24小时昼夜循环存在偏移(昼夜循环时间记录了每个样本的采集时间)。实验室程序可以准确确定每个个体的偏移量并推断样本采集的内部时间,但这些流程成本高昂且费时费力。本文提出一种修正得分函数框架,当确定每个个体偏移量过于繁琐时,仍能建立基因表达随内部时间变化的回归模型。该框架的一个特点是无需假设产生偏移量的概率分布具有零均值的对称性。模拟研究验证了该修正得分函数框架在日前昼夜节律转录组学研究中广泛使用的余弦回归中的有效性。基于三个真实昼夜节律转录组数据集的案例分析进一步表明,相较于未考虑偏移的得分函数方法,本文提出的框架能持续减轻估计偏差。