The Deep Web is constituted by data that are accessible through Web pages, but not readily indexable by search engines as they are returned in dynamic pages. In this paper we propose a conceptual framework for answering keyword queries on Deep Web sources represented as relational tables with so-called access limitations. We formalize the notion of optimal answer, characterize queries for which an answer can be found, and present a method for query processing based on the construction of a query plan that minimizes the accesses to the data sources.
翻译:深层网络由可通过网页访问但不易被搜索引擎索引的数据构成,因为这些数据以动态页面的形式返回。本文提出了一种概念框架,用于对以具有访问限制的关系表形式表示的深层网络源进行关键词查询应答。我们形式化了最优答案的概念,刻画了可找到答案的查询特征,并提出了一种基于构建查询计划的查询处理方法,该方法能最小化对数据源的访问次数。