growclusters for R is a package that estimates a partition structure for multivariate data. It does this by implementing a hierarchical version of k-means clustering that accounts for possible known dependencies in a collection of datasets, where each set draws its cluster means from a single, global partition. Each component data set in the collection corresponds to a known group in the data. This paper focuses on R Shiny applications that implement the clustering methodology and simulate data sets with known group structures. These Shiny applications implement novel ways of visualizing the results of the clustering. These visualizations include scatterplots of individual data sets in the context of the entire collection and cluster distributions versus component (or sub-domain) datasets. Data obtained from a collection of 2000-2013 articles from the Bureau of Labor Statistics (BLS) Monthly Labor Review (MLR) will be used to illustrate the R-Shiny applications. Here, the known grouping in the collection is the year of publication.
翻译:growclusters for R是一个用于估计多元数据划分结构的软件包。该软件包通过实现层次化k-means聚类方法,处理数据集合中可能存在的已知依赖关系——每个数据集中的聚类均值均源于单个全局划分,而集合中的每个分量数据集对应数据中的一个已知分组。本文聚焦于实现该聚类方法并模拟具有已知分组结构数据集的R Shiny应用程序。这些Shiny应用创新性地实现了聚类结果的可视化方法,包括基于整个数据集合背景下的单个数据集散点图,以及聚类分布与分量(或子域)数据集的对比图。本文将以美国劳工统计局(BLS)《劳工月刊》(MLR)2000-2013年文章集合数据为例,展示R-Shiny应用程序的使用效果,其中数据集合中的已知分组为出版年份。