Text analysis is the process of constructing structured data from unstructured textual content, usually implemented in Python. In terms of the principles of text analysis, a computer program with the ability to read a file and match it with a regular expression is all that is needed for basic text analysis. However, few researchers have used Stata as their main text analysis tool. In this paper, I will take a step-by-step approach to the practical process, giving examples of how text analysis can be performed with Stata, and comparing the code and running time with Python.
翻译:文本分析是从非结构化文本内容中构建结构化数据的过程,通常借助Python实现。从文本分析的原理来看,具备读取文件并与正则表达式匹配能力的计算机程序即可完成基础文本分析。然而,少有研究者将Stata作为主要的文本分析工具。本文将以循序渐进的方式介绍实际操作流程,通过示例展示如何使用Stata进行文本分析,并与Python的代码和运行时间进行对比。