Testing Truncation Dependence: The Gumbel Copula

In the analysis of left- and double-truncated durations, it is often assumed that the age at truncation is independent of the duration. When truncation is a result of data collection in a restricted time period, the truncation age is equivalent to the date of birth. The independence assumption is then at odds with any demographic progress when life expectancy increases with time, with evidence e.g. on human demography in western civilisations. We model dependence with a Gumbel copula. Marginally, it is assumed that the duration of interest is exponentially distributed, and that births stem from a homogeneous Poisson process. The log-likelihood of the data, considered as truncated sample, is derived from standard results for point processes. Testing for positive dependence must include that the hypothetical independence is associated with the boundary of the parameter space. By non-standard theory, the maximum likelihood estimator of the exponential and the Gumbel parameter is distributed as a mixture of a two- and a one-dimensional normal distribution. For the proof, the third parameter, the unobserved sample size, is profiled out. Furthermore, verifying identification is simplified by noting that the score of the profile model for the truncated sample is equal to the score for a simple sample from the truncated population. In an application to 55 thousand double-truncated lifetimes of German businesses that closed down over the period 2014 to 2016, the test does not find an increase in business life expectancy for later years of the foundation. The $p$-value is $0.5$ because the likelihood has its maximum for the Gumbel parameter at the parameter space boundary. A simulation under the condition of the application suggests that the test retains the nominal level and has good power.

翻译：在左截断和双截断持续时间分析中，通常假设截断年龄与持续时间独立。当截断源于限定时间范围内的数据收集时，截断年龄等同于出生日期。此时，若预期寿命随时间增长（例如西方文明人类人口统计数据所证明），独立性假设与人口进展相矛盾。我们采用Gumbel Copula建模相依关系。边缘分布假设感兴趣持续时间服从指数分布，且出生事件源于齐次泊松过程。数据（视为截断样本）的对数似然函数基于点过程的标准结论推导得出。检验正相依性必须考虑假设独立性对应于参数空间边界的情形。根据非标准理论，指数参数与Gumbel参数的最大似然估计量服从二维与一维正态分布的混合分布。证明中，第三个参数（未观测样本量）通过剖面似然方法消除。此外，通过注意到截断样本的剖面模型得分等于截断总体简单样本的得分，可简化识别验证。针对2014至2016年间关闭的5.5万家德国企业的双截断生命周期数据应用检验时，未发现后期创立企业的商业寿命增长。由于Gumbel参数似然函数在参数空间边界处取得最大值，$p$值为0.5。基于应用条件的模拟表明，该检验能维持名义检验水平并具备良好功效。