With contemporary data sets becoming too large to analyze the data directly, various forms of aggregated data are becoming common. The original individual data are points, but after aggregation, the observations are interval-valued (e.g.). While some researchers simply analyze the set of averages of the observations by aggregated class, it is easily established that approach ignores much of the information in the original data set. The initial theoretical work for interval-valued data was that of Le-Rademacher and Billard (2011), but those results were limited to estimation of the mean and variance of a single variable only. This article seeks to redress the limitation of their work by deriving the maximum likelihood estimator for the all important covariance statistic, a basic requirement for numerous methodologies, such as regression, principal components, and canonical analyses. Asymptotic properties of the proposed estimators are established. The Le-Rademacher and Billard results emerge as special cases of our wider derivations.
翻译:随着当代数据集规模过大而无法直接分析原始数据,各种形式的数据聚合方法日益普遍。原始个体数据为点数据,但经聚合后观测值变为区间值(例如)。尽管部分研究者仅分析按聚合类划分的观测均值集,但易于证明该方法忽略了原始数据集中的大量信息。区间值数据的初始理论工作由Le-Rademacher与Billard(2011)完成,但其成果仅限于单一变量的均值与方差估计。本文旨在通过推导全协方差统计量(这是回归分析、主成分分析、典型相关分析等众多方法的基本需求)的极大似然估计量来弥补其理论局限。我们建立了所提出估计量的渐近性质。Le-Rademacher与Billard的结论作为我们更广泛推导的特例呈现。