PositionCare.com major engines


 HOME :: LINKS :: SITE MAP ::
Google
  SE tutorial
 » Introduction
 » The <TITLE> tag
 » Meta tags
 » Improve your ranking
 » Web design
 » Search engines vs. spam
 » Link popularity
 » Well indexed
 » Frames
 » Full text indexing
 » Query relevance
 » Avoiding the index

 

  how search engines crawl the web

The index of crawler based search engines (like AltaVista, Infoseek, Inktomi etc.) are built by sending out crawlers (robot programs) that capture text and bring it back. No human filtering or judgement is involved. What they see is what you'll get.


A D V E R T I S E M E N T

The crawler sends out thousands of HTTP requests simultaneously like thousands of blind users grabbing text, pulling it back, and throwing it into the indexing machines so that text can be in the index.

The major crawler often has "cousins," other crawlers that do specialized jobs to help keep the index current, such as checking for "dead" links -- pages that have been moved or deleted and should not be in the index.

How does the crawler know where to go? It follows the links it finds in the pages it retrieves. When a page is captured, the links from that page go into a list of where to go to next. In theory, there is no need to tell a search engine about your site -- it should be found automatically.

In a typical day a crawler and its cousins visit over several million pages. But this is a random game with hundreds of millions of Web pages. Pages with many links to them may be found frequently by the crawler. Pages with few links might be found in a week, a month, six months, or even longer. Pages with no links at all to them will never be found. You have to submit them in order to be found.



HOMEHELP
Contact Us    Terms of Service    Privacy Policy
© 1999-2007 PositionCare.com. All Rights Reserved.
 Hosted by: PowWeb.com - About World Internet Group