Large language models have seen extraordinary growth in popularity due to their human-like content generation capabilities. We show that these models can also be used to successfully cluster human-generated content, with success defined through the measures of distinctiveness and interpretability. This success is validated by both human reviewers and ChatGPT, providing an automated means to close the 'validation gap' that has challenged short-text clustering. Comparing the machine and human approaches we identify the biases inherent in each, and question the reliance on human-coding as the 'gold standard'. We apply our methodology to Twitter bios and find characteristic ways humans describe themselves, agreeing well with prior specialist work, but with interesting differences characteristic of the medium used to express identity.
翻译:大型语言模型因其类人的内容生成能力而获得了非凡的普及。我们表明,这些模型也可用于成功地对人类生成的内容进行聚类,其成功通过区分度和可解释性这两个衡量标准来定义。这一成功得到了人类评审者和 ChatGPT 的验证,提供了一种自动化方式来弥合长期困扰短文本聚类的“验证鸿沟”。通过比较机器方法和人类方法,我们识别了各自存在的固有偏见,并对依赖人工编码作为“黄金标准”的做法提出质疑。我们将该方法应用于 Twitter 个人简介,并发现了人类描述自身的典型方式,这与先前专业研究的结果吻合良好,但同时也存在因表达身份媒介特性而产生的有趣差异。