TY - JOUR ID - TI - Effective Web Page Crawler AU - Isra’a Tahseen Ali AU - Hilal Hadi Saleh PY - 2011 VL - 29 IS - 3 SP - 513 EP - 530 JO - Engineering and Technology Journal مجلة الهندسة والتكنولوجيا SN - 16816900 24120758 AB - The World Wide Web (WWW) has grown from a few thousand pages in1993 to more than eight billion pages at present. Due to this explosion in size,web search engines are becoming increasingly important as the primary meansof locating relevant information.This research aims to build a crawler that crawls the most important webpages, a crawling system has been built which consists of three maintechniques. The first is Best-First Technique which is used to select the mostimportant page. The second is Distributed Crawling Technique which based onUbiCrawler. It is used to distribute the URLs of the selected web pages toseveral machines. And the third is Duplicated Pages Detecting Technique byusing a proposed document fingerprint algorithm.

ER -