Topic modeling and text mining are subsets of Natural Language Processing with relevance for conducting meta-analysis (MA) and systematic review (SR). For evidence synthesis, the above NLP methods are conventionally used for topic-specific literature searches or extracting values from reports to automate essential phases of SR and MA. Instead, this work proposes a comparative topic modeling approach to analyze reports of contradictory results on the same general research question. Specifically, the objective is to find topics exhibiting distinct associations with significant results for an outcome of interest by ranking them according to their proportional occurrence and consistency of distribution across reports of significant results. The proposed method was tested on broad-scope studies addressing whether supplemental nutritional compounds significantly benefit macular degeneration (MD). Eight compounds were identified as having a particular association with reports of significant results for benefitting MD. Six of these were further supported in terms of effectiveness upon conducting a follow-up literature search for validation (omega-3 fatty acids, copper, zeaxanthin, lutein, zinc, and nitrates). The two not supported by the follow-up literature search (niacin and molybdenum) also had the lowest scores under the proposed methods ranking system, suggesting that the proposed method's score for a given topic is a viable proxy for its degree of association with the outcome of interest. These results underpin the proposed methods potential to add specificity in understanding effects from broad-scope reports, elucidate topics of interest for future research, and guide evidence synthesis in a systematic and scalable way.
翻译:主题建模和文本挖掘是自然语言处理的子领域,在开展荟萃分析(MA)和系统评价(SR)中具有重要应用价值。在证据综合过程中,上述自然语言处理方法通常用于特定主题的文献检索,或从研究报告中提取数值信息以自动化SR和MA的关键环节。本研究提出一种比较性主题建模方法,通过分析针对同一研究问题得出矛盾结果的报告,具体目标是:根据显著结果报告中各主题的比例出现频率和分布一致性进行排序,从而发现与特定研究结局显著结果具有独特关联的主题。将该方法应用于评估营养补充剂对黄斑变性(MD)是否具有显著疗效的广泛范围研究,识别出八种化合物与MD获益的显著结果报告存在特殊关联。通过后续文献检索验证,其中六种化合物(ω-3脂肪酸、铜、玉米黄质、叶黄素、锌和硝酸盐)的有效性得到进一步支持。未获后续文献检索支持的两种化合物(烟酸和钼)在本方法排名系统中的得分也最低,表明该方法对特定主题的评分可有效表征其与目标结局的关联程度。这些结果证实了该方法在以下方面的潜力:增强对广泛范围研究报告效应的特异性理解、揭示未来研究的关键主题、以及以系统化和可扩展的方式指导证据综合。