The breakdown of your page - Hit lists

 
Do you want to know more about the high paying keywords in your market segment? Check out our latest Keyword Bid Price Tracking Tool. More keywords to watch!
  

After the web page has been crawled into the repository, the indexer parses through every web page and break down it into a logical structure called hit lists.

A hit list respresents a list of word occurence in the web document fetching from the Internet, it records the position, font and capitalization information.

There are two main types of hits: fancy hits and plain hits. Fancy hits include hit occurring in a URL, title, anchor text, or meta tag. Plain hits include everything else.

In the barrel, the indexer builds up a forward indexer for every web page with a list of hits associated with it. With the docID, the search engine can query what information of that page composed of with ease.

Example: a list of doc in the barrel with forward index
| docID |
| wordID | no-of-hit | hit, hit, hit |
| wordID | no-of-hit | hit, hit, hit, hit|
| null wordID |
| docID |
| wordID | no-of-hit | hit, hit, hit |
| wordID | no-of-hit | hit, hit, hit |
| wordID | no-of-hit | hit, hit, hit |
| null wordID |

phpbb_admin ??Mon, 2006 ??03 ??20 15:46


Google
 
Web www.seoearnings.com