What does Repository store?
In Google, the respository contains full HTML of every web page fetched by the crawler. Google uses the zlib (RFC 1950) for compressing the data with the trade off of compression ratio. For every web page, it is stored as a packet with the following format.
| docid | encode | url-lenght | page-len | url | page |
The respository plays an important role for the origin of the information sources in the search engine. Data can be re-built from this database.