Posts filed under ‘Tools’

How gzip encoding reduces bandwidth

Yesterday, Matt Cutts posted more details about the caching that Google’s crawlers are now doing to further clarify the whole AdSense push vs. AdSense pull issue. One of things he mentioned was how webmasters can turn on “gzip encoding” to save even further bandwidth. Since not everyone reading this is a webmaster, I thought I’d explain what he meant in further detail.

HTTP Headers

As you know, the HTTP protocol is what a web browser uses to communicate with a web server. The browser (a type of web client or user agent) always initiates the conversation with the web server by sending it a URL. In other words, if you type http://www.memwg.com/blog/adsense into your browser to read this blog, the browser sends a request (technically, a “GET” request) to the server located at http://www.memwg.com for the content located at the path /blog/adsense.

However, a bunch of other information gets sent along with the request: the type of browser being used, the user’s preferred languages, the underlying operating system type, what kind of image formats are accepted, etc. (See Masquerading Your Browser for information on how to alter or hide some of this information.) This information is attached to the request as a set of headers, basically name-value pairs of data. You can use my free HTTP header viewer tool to see what headers your browser is sending right now.

Content Encoding

Normally, any data requested by the client is sent by the web server byte-for-byte down the pipe. If you request a web page that is 10,320 bytes long, the web server sends the entire 10,320 bytes to the client. In other words, the data is sent in its “raw” or “natural” form.

One of the headers that a client can send is the Accept-Encoding header, which tells the web server that the client can receive compressed data as an alternative. If the server so chooses, it selects one of the encodings that the client supports (the client sends a list of supported encodings) and compresses the data with the selected encoding algorithm. Instead of sending a 10,320 byte document in the example above, it might end up sending a 4,567 byte long document — a significant savings. (The amount of compression depends on the algorithm being used and the data being compressed. Typically, HTML pages become much smaller.)

When the server encodes data like this, it’s the client’s job to decode it on the other end back into its raw form. The server actually sends headers back to the client as part of the response, and one of those, the Content-Encoding header, indicates which algorithm it used for the encoding. The client can then decode the data by selecting the appropriate algorithm.

GZIP Encoding

On Unix/Linux machines, the gzip application is used to compress and decompress data. But the term “gzip” or “GZIP” is also used as shorthand for the compression/decompression algorithm used by the gzip application. So when you hear someone refer to “gzip encoding”, they’re talking about data that is encoded by the same algorithm used by the gzip application.

A web browser that understands gzip encoding sends an Accept-Encoding header that looks like this:

Accept-Encoding: gzip

The web server encodes the data using the gzip algorithm and sends back the appropriate Content-Encoding header:

Content-Encoding: gzip

The browser then uses the gzip decoding algorithm to return the data to its normal, uncompressed form.

Why GZIP Encoding Helps

The idea behind gzip encoding is to reduce the amount of data being transferred over the network. In the example above, the size of the document was reduced by over half. Not only does the data transmit more quickly, you also get charged less for its transmittal — in general, the less bandwidth you’re using, the less you pay.

There are downsides to gzip encoding, though. Any data compression takes time and processing cycles, so a heavily-used web server may find itself slowed down even more if gzip encoding is enabled. And not all data types compress well — images often end up being bigger when compressed — so the server shouldn’t automatically compress everything, even if the client requests it. And some older clients have bugs in their decoding algorithms.

Note that gzip encoding is not limited to web browsers, it can be used by web crawlers as well. Browsers and crawlers look the same to a web server, they just have different headers. Matt indicated that Google has now enabled gzip encoding in all of its crawlers. So if you’re finding that your site is being crawled excessively by crawlers and using up your precious bandwidth, make sure gzip encoding is enabled in your web server — it could make a big difference.

Originally from An AdSense Blog: Make Easy Money with Google on April 24, 2006, 10:31am

Ads by Yahoo!

June 21, 2006 at 6:14 am Leave a comment

Evaluating PHP Applications

Following on from here, perhaps the two most common questions I’ve seen people ask, when it comes to evaluating PHP applications are;

  • Does it loook good?
  • Is it easy to install?

Now not everyone is a programmer or a system administrator—”normal human beings” rank these highly because they relate directly to the two most pressing problems they’re facing: they want a site which is visually attractive and, with limited technical expertise, installation can be a significant hurdle to overcome.

But when it comes to security or maintenance, those requirements rank pretty low down. So here’s some different things to think about, following on from this talk (PDF) on page 19, which I’d argue rank much higher when evaluating a project you plan to use (further suggestions appreciated).

Note that in an ideal world you’d have time and expertise on hand to do a full code review but in reality that’s not going to happen so what I’m suggesting here is meant as a reasonable compromise to help you build up a “ballpark” feeling for an application without making a huge effort.

What’s the security record like? The obvious place to find out is via Google with some searches like “appName exploit”, “appName vulnerability”, “appName security”. A place to get a better impression is searching the Bugtraq mailing lists.

Of course you have to bear in mind that quality of information may vary—simply finding a random online Online opinion that “appName rox / sux” is not enough. Also newer or less popular applications won’t have attracted enough attention to form valid opinions this way. And you have to bear in mind that pretty much every application that’s been around and has real users will have problems at one time or other but comparing this to this, it’s easy to spot the difference.

As a side note there, I’d recommend registering on this mailing list—pretty much all security issues with well known (and less well known) PHP Open Source code bases get announced here.

What’s the code like? Although a complete code review is not realistic, with a little effort and knowhow, you can get a good idea of how the code smells.

Number 1 tool here is phpxref, which makes it very easy to identify use (or lack of) of PHP functions—run the source code through it that check the results. For example you might look for use of eval (and friends)—in general there’s zero valid reason to use eval so if you find it, query the developers on exactly why they used it. You might also find the absence of some functions indicators—if the app uses MySQL at the backend, do you find any of mysql_escape_string, mysql_real_escape_string or addslashes to escape parameters to SQL queries. Are htmlspecialchars or htmlentities being used to escape output? Is there any use of the PCRE or POSIX extended regular expressions functions for stuff like validation?

Otherwise, what does the code look like to you? This is highly subjective and depends on your experience but does it look “sane”?

How is the code being managed? Another area to investigate is how the project is actually run. How many people are involved and are they active? Do they have sense making release / upgrade policies—clear version numbering, good documentation on how to upgrade, are they using version control, what are their communication channels etc.?

Chris Kunz made a wry remark while giving this talk. He helps run a shared hosting company and pointed out most of their users were extremely happy when they could install an application in the first place—once installed there was no way they were going to risk breaking it with an upgrade.

As a user of an application, you have to be aware that it is really your responsibility to keep pace with new releases, especially when they contain bug or security fixes. As an example of a project that does a good job here, check out Serendipity’s upgrade docs. The question you need to ask yourself is “can I do this?”. You’re also going to need to make the effort to stay informed—subscribe to the relevant mailing list / RSS feed etc., so you hear about new releases.

Does it scale? More on the maintenance front, what’s the application like after you’ve been using it for a while and you’ve collected a volume of data and a crowd of active users? Can that forum cope with a large number of posts and concurrent users? How does that wiki handle a large number of documents? Is using the packaged RSS feed like volunteering for a DOS attack? How easy is it to backup / restore the data? Is a shared host account with nothing but FTP access adequate to maintain this application? Does the admin interface allow you to cope with 20,000 registered users?

Some of those kind of questions can be answered by talking to other users. Others can be determined by seeing what the developers are doing for example are they benchmarking / profiling their code?

Who’s using it? That Mediawiki is the code behind Wikipedia is obviously a very good indicator. Meanwhile Zend use fudforum. I’m not suggesting blindly following here BTW—the reasons for selection may not match situation (you could always ask) but this does serve as a useful indicator.

You should also be careful about “following the herd”. Just because “everyone” uses it, doesn’t always mean it’s the smartest choice. There may also be a specific benefit to not using the same as everyone else—big installed bases make attactive targets.

Who’s got an opinion? There are a lot of people “out there” with knowledge of PHP, so getting opinions isn’t a problem. At the same time, it’s worth considering where an opinion is coming from and bearing in mind it’s just an opinion. Sometimes even the most experienced disagree. So this path can be as misleading as it is useful but shouldn’t be ignored.

Anyway—that’s off the top of my head. Anything else?

This article provided by sitepoint.com.

Originally by HarryF from SitePoint Blogs on March 24, 2006, 6:46am

Ads by Yahoo!

June 21, 2006 at 6:13 am Leave a comment


Calendar

July 2017
M T W T F S S
« Jun    
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Posts by Month

Posts by Category