In this paper, we study the identifiability and the estimation of the parameters of a copula-based multivariate model when the margins are unknown and are arbitrary, meaning that they can be continuous, discrete, or mixtures of continuous and discrete. When at least one margin is not continuous, the range of values determining the copula is not the entire unit square and this situation could lead to identifiability issues that are discussed here. Next, we propose estimation methods when the margins are unknown and arbitrary, using pseudo log-likelihood adapted to the case of discontinuities. In view of applications to large data sets, we also propose a pairwise composite pseudo log-likelihood. These methodologies can also be easily modified to cover the case of parametric margins. One of the main theoretical result is an extension to arbitrary distributions of known convergence results of rank-based statistics when the margins are continuous. As a by-product, under smoothness assumptions, we obtain that the asymptotic distribution of the estimation errors of our estimators are Gaussian. Finally, numerical experiments are presented to assess the finite sample performance of the estimators, and the usefulness of the proposed methodologies is illustrated with a copula-based regression model for hydrological data.
翻译:本文研究了当边际分布未知且任意(即可以是连续型、离散型或连续与离散的混合)时,基于Copula的多变量模型中参数的可识别性与估计问题。当至少一个边际分布非连续时,决定Copula的取值范围并非整个单位正方形,这种情况可能导致本文讨论的可识别性问题。接着,我们提出了当边际分布未知且任意时的估计方法,该方法采用适应非连续情形的伪对数似然函数。考虑到大数据集的应用,我们还提出了一种成对复合伪对数似然方法。这些方法也可轻松修改以涵盖参数化边际分布的情形。主要的理论结果之一是将连续边际分布下基于秩的统计量的已知收敛结果推广至任意分布。作为副产品,在光滑性假设下,我们得到估计误差的渐近分布为高斯分布。最后,通过数值实验评估了估计量的有限样本性能,并利用基于Copula的回归模型对水文数据的应用展示了所提方法的实用性。