This article presents the first systematic review of unsupervised and semi-supervised computational text-based ideal point estimation (CT-IPE) algorithms, methods designed to infer latent political positions from textual data. These algorithms are widely used in political science, communication, computational social science, and computer science to estimate ideological preferences from parliamentary speeches, party manifestos, and social media. Over the past two decades, their development has closely followed broader NLP trends -- beginning with word-frequency models and most recently turning to large language models (LLMs). While this trajectory has greatly expanded the methodological toolkit, it has also produced a fragmented field that lacks systematic comparison and clear guidance for applied use. To address this gap, we identified 25 CT-IPE algorithms through a systematic literature review and conducted a manual content analysis of their modeling assumptions and development contexts. To compare them meaningfully, we introduce a conceptual framework that distinguishes how algorithms generate, capture, and aggregate textual variance. On this basis, we identify four methodological families -- word-frequency, topic modeling, word embedding, and LLM-based approaches -- and critically assess their assumptions, interpretability, scalability, and limitations. Our review offers three contributions. First, it provides a structured synthesis of two decades of algorithm development, clarifying how diverse methods relate to one another. Second, it translates these insights into practical guidance for applied researchers, highlighting trade-offs in transparency, technical requirements, and validation strategies that shape algorithm choice. Third, it emphasizes that differences in estimation outcomes across algorithms are themselves informative, underscoring the need for systematic benchmarking.
翻译:本文首次系统综述了基于文本的无监督与半监督计算理想点估计算法,这些方法旨在从文本数据中推断潜在政治立场。此类算法广泛应用于政治学、传播学、计算社会科学及计算机科学领域,用于从议会演讲、政党宣言及社交媒体中估计意识形态偏好。过去二十年间,其发展紧密跟随自然语言处理领域的整体趋势——从词频模型起步,近期转向大型语言模型。这一演进虽显著拓展了方法工具箱,但也导致领域碎片化,缺乏系统性比较与实际应用的清晰指导。为填补这一空白,我们通过系统性文献综述识别了25种CT-IPE算法,并对其建模假设与发展背景进行了人工内容分析。为实现有效比较,我们提出了一个概念框架,区分算法如何生成、捕获与聚合文本变异。在此基础上,我们归纳出四大方法体系——词频模型、主题模型、词嵌入方法及基于LLM的路径,并批判性评估了其假设条件、可解释性、扩展性及局限性。本综述贡献有三:其一,系统梳理二十年算法发展脉络,阐明不同方法间的关联;其二,将理论洞见转化为应用研究者的实践指南,揭示影响算法选择的透明度、技术需求与验证策略之间的权衡;其三,强调不同算法估计结果的差异本身具有信息价值,凸显系统性基准测试的必要性。