Quantifying Similarity: Text-Mining Approaches to Evaluate ChatGPT and Google Bard Content in Relation to BioMedical Literature

Background: The emergence of generative AI tools, empowered by Large Language Models (LLMs), has shown powerful capabilities in generating content. To date, the assessment of the usefulness of such content, generated by what is known as prompt engineering, has become an interesting research question. Objectives Using the mean of prompt engineering, we assess the similarity and closeness of such contents to real literature produced by scientists. Methods In this exploratory analysis, (1) we prompt-engineer ChatGPT and Google Bard to generate clinical content to be compared with literature counterparts, (2) we assess the similarities of the contents generated by comparing them with counterparts from biomedical literature. Our approach is to use text-mining approaches to compare documents and associated bigrams and to use network analysis to assess the terms' centrality. Results The experiments demonstrated that ChatGPT outperformed Google Bard in cosine document similarity (38% to 34%), Jaccard document similarity (23% to 19%), TF-IDF bigram similarity (47% to 41%), and term network centrality (degree and closeness). We also found new links that emerged in ChatGPT bigram networks that did not exist in literature bigram networks. Conclusions: The obtained similarity results show that ChatGPT outperformed Google Bard in document similarity, bigrams, and degree and closeness centrality. We also observed that ChatGPT offers linkage to terms that are connected in the literature. Such connections could inspire asking interesting questions and generate new hypotheses.

翻译：背景：基于大型语言模型（LLMs）的生成式人工智能工具的出现，在内容生成方面展现了强大的能力。迄今为止，通过所谓的提示工程生成的内容的实用性评估已成为一个有趣的研究课题。目的：通过利用提示工程的方法，我们评估此类内容与科学家撰写的真实文献之间的相似性和接近程度。方法：在此探索性分析中，（1）我们对ChatGPT和Google Bard进行提示工程，生成临床内容并与文献中的对应内容进行比较；（2）通过将生成的内容与生物医学文献中的对应内容进行对比，评估其相似性。我们的方法是采用文本挖掘技术比较文档及其关联二元语法，并利用网络分析评估术语的中心性。结果：实验表明，ChatGPT在余弦文档相似度（38%对34%）、Jaccard文档相似度（23%对19%）、TF-IDF二元语法相似度（47%对41%）以及术语网络中心性（度和接近度）方面均优于Google Bard。我们还发现ChatGPT的二元语法网络中出现了文献二元语法网络中不存在的全新连接。结论：获得的相似性结果显示，ChatGPT在文档相似度、二元语法以及度和接近度中心性方面均优于Google Bard。此外，我们观察到ChatGPT能够建立与文献中已连接术语之间的关联，这种关联可能激发提出有趣问题并产生新假设。