In this paper, we study the identifiability and the estimation of the parameters of a copula-based multivariate model when the margins are unknown and are arbitrary, meaning that they can be continuous, discrete, or mixtures of continuous and discrete. When at least one margin is not continuous, the range of values determining the copula is not the entire unit square and this situation could lead to identifiability issues that are discussed here. Next, we propose estimation methods when the margins are unknown and arbitrary, using pseudo log-likelihood adapted to the case of discontinuities. In view of applications to large data sets, we also propose a pairwise composite pseudo log-likelihood. These methodologies can also be easily modified to cover the case of parametric margins. One of the main theoretical result is an extension to arbitrary distributions of known convergence results of rank-based statistics when the margins are continuous. As a by-product, under smoothness assumptions, we obtain that the asymptotic distribution of the estimation errors of our estimators are Gaussian. Finally, numerical experiments are presented to assess the finite sample performance of the estimators, and the usefulness of the proposed methodologies is illustrated with a copula-based regression model for hydrological data. The proposed estimation is implemented in the R package CopulaInference, together with a function for checking identifiability.
翻译:本文研究当边际分布未知且任意(包括连续、离散或连续离散混合分布)时,基于连接函数(copula)的多元模型参数的可识别性与估计问题。当至少一个边际分布非连续时,决定连接函数的取值空间并非完整单位正方形,这种情况可能导致本文讨论的可识别性问题。随后,我们提出当边际分布未知且任意时的估计方法,采用适用于非连续情形的伪对数似然函数。针对大数据集应用,我们还提出一种成对复合伪对数似然方法。这些方法可简便地扩展至参数化边际分布情形。理论上的主要结果之一是将秩统计量的已知收敛结论(当边际分布连续时)推广至任意分布情形。作为副产品,在光滑性假设下,我们证明估计误差的渐近分布为正态分布。最后通过数值实验评估估计量的有限样本表现,并利用水文数据的连接函数回归模型展示所提方法的实用性。所提出的估计算法已集成于R语言CopulaInference程序包中,并附有可识别性检验函数。