The breakdown of your page - Hit lists

After the web page has been crawled into the repository, the indexer parses through every web page and break down it into a logical structure called hit lists. A hit list respresents a list of word occurence in the web document fetching from the Internet, it records the position, font and capitalization information. There are two main types of hits: fancy hits and plain hits. Fancy hits include hit occurring in a URL, title, anchor text, or meta tag. Plain hits include everything else. In the barrel, the indexer builds up a forward indexer for every web page with a list of hits associated with it. With the docID, the search engine can query what information of that page composed of with ease. Example: a list of doc in the barrel with forward index | docID | | wordID | no-of-hit | hit, hit, hit | | wordID | no-of-hit | hit, hit, hit, hit| | null wordID | | docID | | wordID | no-of-hit | hit, hit, hit | | wordID | no-of-hit | hit, hit, hit | | wordID | no-of-hit | hit, hit, hit | | null wordID |