The rapid growth of multi-source, heterogeneous, and multimodal scientific data has increasingly exposed the limitations of traditional data management. Most existing DeepResearch (DR) efforts focus primarily on web search while overlooking local private data. Consequently, these frameworks exhibit low retrieval efficiency for private data and fail to comply with the FAIR principles, ultimately resulting in inefficiency and limited reusability. To this end, we propose IoDResearch (Internet of Data Research), a private data-centric Deep Research framework that operationalizes the Internet of Data paradigm. IoDResearch encapsulates heterogeneous resources as FAIR-compliant digital objects, and further refines them into atomic knowledge units and knowledge graphs, forming a heterogeneous graph index for multi-granularity retrieval. On top of this representation, a multi-agent system supports both reliable question answering and structured scientific report generation. Furthermore, we establish the IoD DeepResearch Benchmark to systematically evaluate both data representation and Deep Research capabilities in IoD scenarios. Experimental results on retrieval, QA, and report-writing tasks show that IoDResearch consistently surpasses representative RAG and Deep Research baselines. Overall, IoDResearch demonstrates the feasibility of private-data-centric Deep Research under the IoD paradigm, paving the way toward more trustworthy, reusable, and automated scientific discovery.
翻译:多源、异构及多模态科学数据的快速增长,日益暴露了传统数据管理的局限性。现有大多数深度研究(DeepResearch,DR)工作主要聚焦于网络搜索,而忽视了本地私有数据。因此,这些框架对于私有数据的检索效率较低,且未能遵循FAIR原则,最终导致效率低下与可复用性受限。为此,我们提出IoDResearch(数据互联网研究)——一种以私有数据为中心的深度研究框架,该框架实现了数据互联网(Internet of Data)范式的落地。IoDResearch将异构资源封装为符合FAIR原则的数字对象,并进一步将其提炼为原子知识单元与知识图谱,形成用于多粒度检索的异构图索引。在此表征基础上,一个多智能体系统既能支持可靠的问答,又能支持结构化科学报告生成。此外,我们建立了IoD深度研究基准(IoD DeepResearch Benchmark),以系统评估IoD场景下的数据表征能力与深度研究能力。在检索、问答及报告撰写任务上的实验结果表明,IoDResearch始终优于具有代表性的RAG及深度研究基线方法。总体而言,IoDResearch证明了在IoD范式下以私有数据为中心的深度研究的可行性,为构建更可信、可复用及自动化的科学发现铺平了道路。