It is well known that the relationship between variables at the individual level can be different from the relationship between those same variables aggregated over individuals. This problem of aggregation becomes relevant when the researcher wants to learn individual-level relationships, but only has access to data that has been aggregated. In this paper, I develop a methodology to partially identify linear combinations of conditional average outcomes from aggregate data when the outcome of interest is binary, while imposing as few restrictions on the underlying data generating process as possible. I construct identified sets using an optimization program that allows for researchers to impose additional shape restrictions. I also provide consistency results and construct an inference procedure that is valid with aggregate data, which only provides marginal information about each variable. I apply the methodology to simulated and real-world data sets and find that the estimated identified sets are too wide to be useful. This suggests that to obtain useful information from aggregate data sets about individual-level relationships, researchers must impose further assumptions that are carefully justified.
翻译:众所周知,个体层面上变量之间的关系可能与这些相同变量在聚合后的关系有所不同。当研究者希望了解个体层面的关系,但仅能获取已聚合的数据时,这种聚合问题便显得尤为重要。本文发展了一种方法论,在结果变量为二元的情况下,利用聚合数据部分识别条件平均结果的线性组合,同时对底层数据生成过程施加尽可能少的约束。我通过一个优化程序构建了可识别集,该程序允许研究者施加额外的形状限制。我还提供了一致性结果,并构建了一种适用于聚合数据的推断方法,这类数据仅提供每个变量的边际信息。我将该方法应用于模拟和真实数据集,发现估计出的可识别集过宽而缺乏实用性。这表明,要从聚合数据集中获得关于个体层面关系的有用信息,研究者必须施加经过谨慎论证的进一步假设。