The paper extends in two directions the work of \cite{Plackett77} who studied how, in a $2\times 2$ table, the likelihood of the column totals depends on the odds ratio. First, we study the marginal likelihood of a single $R\times C$ frequency table when only the marginal frequencies are observed and then consider a collection of, say, $s$ $R\times C$ tables, where only the row and column totals can be observed, which is the basic framework which in applications of Ecological Inference. In the simpler context, we derive the likelihood equations and show that the likelihood has a collection of local maxima which, after a suitable rearrangement of the row and column categories, exhibit the strongest positive association compatible with the marginals, a kind of paradox, considering that the available data are so poor. Next, we derive the likelihood equations for the marginal likelihood of a collection of tow-way tables, under the assumption that they share the same row conditional distributions and derive a necessary condition for the information matrix to be well defined. We also describe a Fisher-scoring algorithm for maximizing the marginal likelihood which, however, can be used only if the number of available replications reaches a given threshold.
翻译:本文从两个方向拓展了\cite{Plackett77}的工作,该研究探讨了在$2\times 2$表中,列总和的似然如何依赖于优势比。首先,我们研究了当仅观测到边际频数时,单个$R\times C$频数表的边际似然,随后考虑了一组(例如$s$个)$R\times C$表,其中仅能观测到行总和与列总和,这正是生态推断应用中的基本框架。在较简单的背景下,我们推导了似然方程,并证明该似然具有一系列局部极大值点;在对行与列类别进行适当重排后,这些极大值点呈现出与边际相容的最强正相关性——考虑到可用数据如此贫乏,这构成了一种悖论。接着,我们推导了一组双向表在共享相同行条件分布假设下的边际似然方程,并给出了信息矩阵良定义的一个必要条件。我们还描述了一种用于最大化该边际似然的Fisher评分算法,但该算法仅当可用重复样本数达到给定阈值时才可使用。