Indexes are useful for summarizing multivariate information into single metrics for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, more attention needs to be directed towards making it possible to understand index behavior in different data conditions, and to determine how their structure affects their values and variation in values. Here we discuss a modular data pipeline recommendation to assemble indexes. It is universally applicable to index computation and allows investigation of index behavior as part of the development procedure. One can compute indexes with different parameter choices, adjust steps in the index definition by adding, removing, and swapping them to experiment with various index designs, calculate uncertainty measures, and assess indexes robustness. The paper presents three examples to illustrate the pipeline framework usage: comparison of two different indexes designed to monitor the spatio-temporal distribution of drought in Queensland, Australia; the effect of dimension reduction choices on the Global Gender Gap Index (GGGI) on countries ranking; and how to calculate bootstrap confidence intervals for the Standardized Precipitation Index (SPI). The methods are supported by a new R package, called tidyindex.
翻译:指数对于将多变量信息汇总为单一指标以进行监测、沟通和决策非常有用。尽管大多数研究集中于为特定目的定义新指数,但需要更多关注如何理解不同数据条件下指数的行为,以及其结构如何影响数值及数值变异。本文提出了一种模块化数据管道建议来构建指数。该方法普遍适用于指数计算,并允许在开发过程中研究指数行为。用户可基于不同参数选择计算指数,通过添加、删除和替换步骤调整指数定义以试验各种指数设计,计算不确定性度量,并评估指数的稳健性。本文通过三个示例说明该管道框架的使用:比较澳大利亚昆士兰州干旱时空分布监测的两种不同指数;降维选择对全球性别差距指数(GGGI)国家排名的影响;以及如何计算标准化降水指数(SPI)的置信区间。这些方法由一个名为tidyindex的新R包提供支持。