Our algorithm GNN: Graph Neural Network and Large Language Model Based for Data Discovery inherits the benefits of \cite{hoang2024plod} (PLOD: Predictive Learning Optimal Data Discovery), \cite{Hoang2024BODBO} (BOD: Blindly Optimal Data Discovery) in terms of overcoming the challenges of having to predefine utility function and the human input for attribute ranking, which helps prevent the time-consuming loop process. In addition to these previous works, our algorithm GNN leverages the advantages of graph neural networks and large language models to understand text type values that cannot be understood by PLOD and MOD, thus making the task of predicting outcomes more reliable. GNN could be seen as an extension of PLOD in terms of understanding the text type value and the user's preferences based on not only numerical values but also text values, making the promise of data science and analytics purposes.
翻译:我们的算法GNN:基于图神经网络与大语言模型的数据发现算法继承了\cite{hoang2024plod}(PLOD:预测学习最优数据发现)与\cite{Hoang2024BODBO}(BOD:盲最优数据发现)的优势,能够克服需预先定义效用函数及人工输入属性排序的挑战,从而避免耗时的循环过程。除上述先前工作外,我们的GNN算法进一步利用图神经网络与大语言模型的优势,以理解PLOD和MOD无法处理的文本类型数值,从而使结果预测任务更为可靠。GNN可视为PLOD的扩展,其不仅基于数值,更能基于文本值理解文本类型数值及用户偏好,从而更好地实现数据科学与分析的目标。