Indexes are useful for summarizing multivariate information into single metrics for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, more attention needs to be directed towards making it possible to understand index behavior in different data conditions, and to determine how their structure affects their values and variation in values. Here we discuss a modular data pipeline recommendation to assemble indexes. It is universally applicable to index computation and allows investigation of index behavior as part of the development procedure. One can compute indexes with different parameter choices, adjust steps in the index definition by adding, removing, and swapping them to experiment with various index designs, calculate uncertainty measures, and assess indexes robustness. The paper presents three examples to illustrate the pipeline framework usage: comparison of two different indexes designed to monitor the spatio-temporal distribution of drought in Queensland, Australia; the effect of dimension reduction choices on the Global Gender Gap Index (GGGI) on countries ranking; and how to calculate bootstrap confidence intervals for the Standardized Precipitation Index (SPI). The methods are supported by a new R package, called tidyindex.
翻译:指数在将多变量信息汇总为单一指标方面具有重要价值,可用于监测、沟通与决策。尽管现有研究主要聚焦于针对特定目标定义新指数,但更需关注的是理解指数在不同数据条件下的行为特征,以及其结构如何影响数值取值与变异程度。本文提出一种模块化数据流水线推荐方案以组装指数,该方案普遍适用于各类指数计算,并允许在开发过程中探究指数行为。用户可通过不同参数选择计算指数,通过新增、移除或替换指数定义中的步骤来试验不同设计结构,计算不确定性测度,并评估指数稳健性。本文通过三个案例阐明流水线框架的应用:比较两种旨在监测澳大利亚昆士兰州干旱时空分布的指数;分析降维选择对全球性别差距指数(GGGI)国家排名的影响;以及计算标准化降水指数(SPI)的bootstrap置信区间。相关方法由新开发的R语言软件包tidyindex提供支持。