In the present time, all know about World Wide Web and work over the Internet daily. In this paper, we introduce the search engines working for keywords that are entered by users to find something. The search engine uses different search algorithms for convenient results for providing to the net surfer. Net surfers go with the top search results but how did the results of web pages get higher ranks over search engines? how the search engine got that all the web pages in the database? This paper gives the answers to all these kinds of basic questions. Web crawlers working for search engines and robot exclusion protocol rules for web crawlers are also addressed in this research paper. Webmaster uses different restriction facts in robot.txt file to instruct web crawler, some basic formats of robot.txt are also mentioned in this paper.
翻译:当下,众人皆知万维网并每日工作在互联网上。本文介绍了搜索引擎针对用户输入关键词进行搜索的工作原理。搜索引擎采用不同的搜索算法,以便为网络浏览者提供便捷的结果。网络浏览者会优先关注排名靠前的结果,但网页结果是如何在搜索引擎中获得更高排名的?搜索引擎又是如何获取数据库中所有网页的?本文回答了这些基本问题。此外,本文还探讨了为搜索引擎服务的网络爬虫,以及用于网络爬虫的机器人排除协议规则。网站管理员在robots.txt文件中使用不同的限制条件来指示网络爬虫,本文还提及了robots.txt的一些基本格式。