The TCPD-IPD dataset is a collection of questions and answers discussed in the Lower House of the Parliament of India during the Question Hour between 1999 and 2019. Although it is difficult to analyze such a huge collection manually, modern text analysis tools can provide a powerful means to navigate it. In this paper, we perform an exploratory analysis of the dataset. In particular, we present insightful corpus-level statistics and a detailed analysis of three subsets of the dataset. In the latter analysis, the focus is on understanding the temporal evolution of topics using a dynamic topic model. We observe that the parliamentary conversation indeed mirrors the political and socio-economic tensions of each period.
翻译:TCPD-IPD数据集收录了1999年至2019年印度议会下院质询时间讨论的问题与回答。尽管手动分析如此庞大的数据集较为困难,但现代文本分析工具可为此提供有效手段。本文对该数据集进行了探索性分析,具体展示了富有洞察力的语料库级统计信息,并详细分析了数据集的三个子集。在后一项分析中,我们重点利用动态主题模型理解主题的时间演化规律。研究发现,议会讨论内容确实反映了各时期的政治与社会经济紧张态势。