In this paper, we study the identifiability and the estimation of the parameters of a copula-based multivariate model when the margins are unknown and are arbitrary, meaning that they can be continuous, discrete, or mixtures of continuous and discrete. When at least one margin is not continuous, the range of values determining the copula is not the entire unit square and this situation could lead to identifiability issues that are discussed here. Next, we propose estimation methods when the margins are unknown and arbitrary, using pseudo log-likelihood adapted to the case of discontinuities. In view of applications to large data sets, we also propose a pairwise composite pseudo log-likelihood. These methodologies can also be easily modified to cover the case of parametric margins. One of the main theoretical result is an extension to arbitrary distributions of known convergence results of rank-based statistics when the margins are continuous. As a by-product, under smoothness assumptions, we obtain that the asymptotic distribution of the estimation errors of our estimators are Gaussian. Finally, numerical experiments are presented to assess the finite sample performance of the estimators, and the usefulness of the proposed methodologies is illustrated with a copula-based regression model for hydrological data.
翻译:本文研究了当边缘分布未知且任意(即可以是连续型、离散型或连续与离散的混合分布)时,基于Copula的多元模型参数的可识别性与估计问题。当至少一个边缘分布非连续时,确定Copula的取值范围并非整个单位正方形,这种情况可能导致本文所讨论的可识别性问题。随后,我们提出了当边缘分布未知且任意时的估计方法,采用了适用于非连续情况的伪对数似然法。针对大数据集的应用,我们还提出了一种成对复合伪对数似然法。这些方法也可轻松推广至参数化边缘分布的情形。本文的主要理论成果之一是,将已知的基于秩统计量的收敛结果(在边缘分布连续时成立)推广至任意分布情形。作为衍生结论,在光滑性假设下,我们证明了估计误差的渐近分布服从高斯分布。最后,通过数值实验评估了估计量的有限样本性能,并利用基于Copula的回归模型对水文数据进行了实证分析,展示了所提方法的实用性。