How gzip encoding reduces bandwidth

June 21, 2006 at 6:14 am Leave a comment

Yesterday, Matt Cutts posted more details about the caching that Google’s crawlers are now doing to further clarify the whole AdSense push vs. AdSense pull issue. One of things he mentioned was how webmasters can turn on “gzip encoding” to save even further bandwidth. Since not everyone reading this is a webmaster, I thought I’d explain what he meant in further detail.

HTTP Headers

As you know, the HTTP protocol is what a web browser uses to communicate with a web server. The browser (a type of web client or user agent) always initiates the conversation with the web server by sending it a URL. In other words, if you type http://www.memwg.com/blog/adsense into your browser to read this blog, the browser sends a request (technically, a “GET” request) to the server located at http://www.memwg.com for the content located at the path /blog/adsense.

However, a bunch of other information gets sent along with the request: the type of browser being used, the user’s preferred languages, the underlying operating system type, what kind of image formats are accepted, etc. (See Masquerading Your Browser for information on how to alter or hide some of this information.) This information is attached to the request as a set of headers, basically name-value pairs of data. You can use my free HTTP header viewer tool to see what headers your browser is sending right now.

Content Encoding

Normally, any data requested by the client is sent by the web server byte-for-byte down the pipe. If you request a web page that is 10,320 bytes long, the web server sends the entire 10,320 bytes to the client. In other words, the data is sent in its “raw” or “natural” form.

One of the headers that a client can send is the Accept-Encoding header, which tells the web server that the client can receive compressed data as an alternative. If the server so chooses, it selects one of the encodings that the client supports (the client sends a list of supported encodings) and compresses the data with the selected encoding algorithm. Instead of sending a 10,320 byte document in the example above, it might end up sending a 4,567 byte long document — a significant savings. (The amount of compression depends on the algorithm being used and the data being compressed. Typically, HTML pages become much smaller.)

When the server encodes data like this, it’s the client’s job to decode it on the other end back into its raw form. The server actually sends headers back to the client as part of the response, and one of those, the Content-Encoding header, indicates which algorithm it used for the encoding. The client can then decode the data by selecting the appropriate algorithm.

GZIP Encoding

On Unix/Linux machines, the gzip application is used to compress and decompress data. But the term “gzip” or “GZIP” is also used as shorthand for the compression/decompression algorithm used by the gzip application. So when you hear someone refer to “gzip encoding”, they’re talking about data that is encoded by the same algorithm used by the gzip application.

A web browser that understands gzip encoding sends an Accept-Encoding header that looks like this:

Accept-Encoding: gzip

The web server encodes the data using the gzip algorithm and sends back the appropriate Content-Encoding header:

Content-Encoding: gzip

The browser then uses the gzip decoding algorithm to return the data to its normal, uncompressed form.

Why GZIP Encoding Helps

The idea behind gzip encoding is to reduce the amount of data being transferred over the network. In the example above, the size of the document was reduced by over half. Not only does the data transmit more quickly, you also get charged less for its transmittal — in general, the less bandwidth you’re using, the less you pay.

There are downsides to gzip encoding, though. Any data compression takes time and processing cycles, so a heavily-used web server may find itself slowed down even more if gzip encoding is enabled. And not all data types compress well — images often end up being bigger when compressed — so the server shouldn’t automatically compress everything, even if the client requests it. And some older clients have bugs in their decoding algorithms.

Note that gzip encoding is not limited to web browsers, it can be used by web crawlers as well. Browsers and crawlers look the same to a web server, they just have different headers. Matt indicated that Google has now enabled gzip encoding in all of its crawlers. So if you’re finding that your site is being crawled excessively by crawlers and using up your precious bandwidth, make sure gzip encoding is enabled in your web server — it could make a big difference.

Originally from An AdSense Blog: Make Easy Money with Google on April 24, 2006, 10:31am

Ads by Yahoo!

Advertisements

Entry filed under: Misc, Tools.

A few avoidable errors when promoting your affiliate program Are AdSense publishers being favored with more frequent indexing?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


Calendar

June 2006
M T W T F S S
« Apr    
 1234
567891011
12131415161718
19202122232425
2627282930  

Most Recent Posts


%d bloggers like this: