Quantifying Similarity: Text-Mining Approaches to Evaluate ChatGPT and Google Bard Content in Relation to BioMedical Literature

Background: The emergence of generative AI tools, empowered by Large Language Models (LLMs), has shown powerful capabilities in generating content. To date, the assessment of the usefulness of such content, generated by what is known as prompt engineering, has become an interesting research question. Objectives Using the mean of prompt engineering, we assess the similarity and closeness of such contents to real literature produced by scientists. Methods In this exploratory analysis, (1) we prompt-engineer ChatGPT and Google Bard to generate clinical content to be compared with literature counterparts, (2) we assess the similarities of the contents generated by comparing them with counterparts from biomedical literature. Our approach is to use text-mining approaches to compare documents and associated bigrams and to use network analysis to assess the terms' centrality. Results The experiments demonstrated that ChatGPT outperformed Google Bard in cosine document similarity (38% to 34%), Jaccard document similarity (23% to 19%), TF-IDF bigram similarity (47% to 41%), and term network centrality (degree and closeness). We also found new links that emerged in ChatGPT bigram networks that did not exist in literature bigram networks. Conclusions: The obtained similarity results show that ChatGPT outperformed Google Bard in document similarity, bigrams, and degree and closeness centrality. We also observed that ChatGPT offers linkage to terms that are connected in the literature. Such connections could inspire asking interesting questions and generate new hypotheses.

翻译：背景：由大型语言模型驱动的生成式人工智能工具展现出强大的内容生成能力。目前，通过提示工程（prompt engineering）评估生成内容实用性的研究已成为具有学术价值的研究课题。目的：本研究借助提示工程手段，评估生成内容与科学家撰写的真实文献之间的相似性与接近程度。方法：在本探索性分析中，(1) 通过提示工程技术引导ChatGPT与Google Bard生成临床内容，并与文献对应部分进行对比；(2) 采用文本挖掘方法比较文档及其二元词组（bigrams），并通过网络分析评估术语中心性，从而量化生成内容的相似度。结果：实验表明，ChatGPT在余弦文档相似度（38% vs 34%）、Jaccard文档相似度（23% vs 19%）、TF-IDF二元词组相似度（47% vs 41%）及术语网络中心性（程度中心性与接近中心性）方面均优于Google Bard。同时发现，ChatGPT生成的二元词组网络中出现了文献二元词组网络中不存在的新关联。结论：相似度分析结果显示，ChatGPT在文档相似度、二元词组以及程度中心性与接近中心性指标上均优于Google Bard。我们进一步观察到，ChatGPT能够生成文献中已有术语间的关联表征，此类关联可能激发有价值的研究问题并提出新假说。