Analyses of legislative behavior often rely on voting records, overlooking the rich semantic and rhetorical content of political speech. In this paper, we ask three complementary questions about parliamentary discourse: how things are said, what is being said, and who is speaking in discursively similar ways. To answer these questions, we introduce a scalable and generalizable computational framework that combines diachronic stylometric analysis, contextual topic modeling, and semantic clustering of deputies' speeches. We apply this framework to a large-scale case study of the Brazilian Chamber of Deputies, using a corpus of over 450,000 speeches from 2003 to 2025. Our results show a long-term stylistic shift toward shorter and more direct speeches, a legislative agenda that reorients sharply in response to national crises, and a granular map of discursive alignments in which regional and gender identities often prove more salient than formal party affiliation. More broadly, this work offers a robust methodology for analyzing parliamentary discourse as a multidimensional phenomenon that complements traditional vote-based approaches.
翻译:立法行为分析通常依赖投票记录,却忽视了政治演讲中丰富的语义与修辞内容。本文围绕议会话语提出三个互补性问题:话语如何表达、表达什么内容、以及谁在话语上具有相似性。为回答这些问题,我们提出一个可扩展且具有普适性的计算框架,整合了历时文体计量分析、情境化主题建模以及议员演讲的语义聚类。我们将该框架应用于巴西众议院的大规模案例研究,使用2003年至2025年间超过45万篇演讲的语料库。研究结果显示:长期文体风格趋向更简短直接的演讲;立法议程在国家危机时发生剧烈转向;以及一套话语联盟的精细图谱,其中地区与性别身份往往比正式政党隶属关系更为显著。更广泛而言,本研究提供了一种稳健的方法论,用于将议会话语作为多维现象进行分析,从而补充传统的基于投票的方法。