To characterize the US airline profit cycles from 1995 to 2020, the authors of Renold et al. (2023) combine k-means clustering, principal component analysis, and system dynamic modelling. We replicate their clustering experiment in three spaces -- the original 7-dimensional raw-variable space, a 3-dimensional PC score space, and a 4-dimensional PC score space using their dataset gratefully included in the paper. We show that the six-cluster taxonomy is geometrically robust: k-means in 3-PC space produces bit-for-bit identical cluster assignments relative to 7D raw space. As a nonlinearity check we apply kernel PCA under six kernels spanning three families plus a linear baseline. All six kernels preserve the six-cluster assignment in 2D. A 1D diagnostic tightens this: the linear kernel conflates the COVID year C_3 with the peak-profit cluster C_0, whereas all five non-baseline kernels shift C_3 to overlap only the post-financial-crisis cluster C_5. Agreement across the kernel families confirms an intrinsically linear manifold with no hidden curvature. The silhouette criterion reveals that the dataset structurally supports only three clusters, not six. Collinearity in the raw 7D space suppresses the silhouette signal that would otherwise identify k=3 as the structurally motivated choice.
翻译:为表征1995年至2020年美国航空业利润周期,Renold等人(2023)结合k均值聚类、主成分分析和系统动力学建模。我们在其论文附带的数据集上,于三个空间——原始七维原始变量空间、三维主成分得分空间和四维主成分得分空间中——复现了其聚类实验。结果表明,六分类法在几何上具有鲁棒性:三维主成分空间中的k均值聚类结果与七维原始空间相比,产生了逐比特一致的聚类分配。作为非线性检验,我们在六个核函数(涵盖三个核族及一个线性基线)下应用核主成分分析。所有六个核函数均在二维空间中保持了六类聚类分配。一维诊断进一步收紧了这一结论:线性核将COVID年份C_3与峰值利润簇C_0混淆,而五个非基线核则使C_3仅与金融危机后簇C_5重叠。核族间的一致性验证了数据集具有内在线性流形结构,不存在隐藏曲率。轮廓系数准则表明,该数据集在结构上仅支持三类聚类,而非六类。原始七维空间中的共线性抑制了本应识别k=3为结构性最优选择的轮廓信号。