As far as I remember I’m using Gzip compression in all the websites we build; it’s sort of a no-brainer. You build a website, you use Gzip, you be happy.
But few days back, I decided to know more about Gzip just than the fact that it is used for HTTP compression; hence I browsed my way to it.
The G in in Gzip stands for GNU, which is a collaborative development project. GZip simply is a file format which works on the DEFLATE algorithm (LZ77 + Huffman Encoding) to perform compression. Fun fact, DEFLATE is also the same algorithm which is used in files like PNG.
Gzip is normally used to compress only single files. Compressed archives are created by assembling individual files into a TAR archive and then compressing that TAR (Tape Archive) archive with Gzip; the result then would be .tar.gz or .tgz which is also known as Tarball.
In most simple words, HTTP compression is about serving the data sitting on the server to the browser (client) in a compressed form, on demand. Later, client then decompresses that data before showing it to the user.
HTTP compression can work in either of the two ways – Lower Level or Higher Level.
Lower Level – Transfer Encoding header field is used to indicate that the message/data being recieved is in the compressed form.
Higher Level – Content Encoding header field is used to indicate that the message/data being recieved is in the compressed form.
Some browsers do not advertise the support for Transfer Encoding to avoid triggering bugs in the servers and hence Content Encoding approach is more preffered method.
Gzip is one of the 3 standard formats of the HTTP compression.
Gzip – GNU Zip program
Compress – UNIX file compression program
Zlib – Abstraction of DEFLATE
Zlib was at one point better than Gzip because Gzip additionally adds eleven bytes of overhead in the form of headers and trailers but it is not widely used as Microsoft IE does not implement the Zlib standard correctly.
Gzip is useful in compressing files including xHTML, CSS, JS and text files but is actually of no use if you’ll try compressing an already compressed file or an image file like PNG, because such files already uses some compression technique and Gzip then anyway would add additional data to the file.
Apache and Gzip
Most used, Apache servers support Gzip compression via mod_deflate and mod_gzip module.
mod_deflate: It usually comes bundled with Apache modules. It is faster in terms of compression and decompression and uses less resources. It is also better documented and is easier to configure.
mod_gzip: It is an additional module for Apache. It is slower in terms of compression and decompression and used slightly higher resources.
mod_deflate is most commonly adopted way of implementing Gzip on Apache; but it sucked on version prior to v2 because it produced lower compression ratios back then. In and after Apache v2, the compression level for mod_deflate can be configured.
NGINX had the support for Gzip in-built.
Server – Client – Compression
Browser: Hey yo server, check out my Accept Encoding block in Content Header; I’d like data in zipped format.
<em><span class="na">Accept-Encoding</span><span class="o">:</span> <span class="l">gzip, deflate</span></em>
Server: Wasup Browser, sure thing. Check out the Content Encoding block that I sent with data; the data is zipped. Peace.
<em><span class="na">Content-Type</span><span class="o">:</span> <span class="l">text/html; charset=UTF-8</span>
<span class="hll"><span class="na">Content-Encoding</span><span class="o">:</span> <span class="l">gzip
If at any point, the Content Encoding block returns “identity”, it means the data is in its original uncompressed form.
After this, I read about the common code which is used to compress different sort of files using both mod_deflate and mod_gzip. You can find the code in the below mentioned reference.