Crawl the web!

A web crawler (also known as a web spider or ant) is a program which browses the World Wide Web in an automated manner. Google uses a distributed crawler for fetching web document from the Internet. A single URLserver serves a list of URLs to a number of crawlers (around 3 at the same time). The crawlers download the web page and then send them to a store server. The store server then compresses and stores the page in the repository.