comparison of compression methods for database dumps

I like to have automated daily database backups set up on my servers - for those hosting relatively small sites, the database dumps can usually be compressed down small enough to be sent as an e-mail attachment by the cron script which does the database dump. This is a really simple and cost-effective way of having offsite backups which provide daily snapshots of sites which are easy to access if anything goes wrong.

Initially I just used gzip to compress the sql files down, but as dump files inevitably grew, I decided to look into some alternatives, and in this post I'll share the results of my research. As I'm often sending these backups as e-mail attachments, the size of the compressed file is the main thing I care about, but this isn't the only factor in a comparison. Some of the methods of compression use more resources than others in terms of CPU and memory. I'm not too concerned about this as my dumps are only being done by cron once a day, but I've included a measurement of resource use; the time taken in seconds (as measured by the time command).

The compression methods I tried out included gzip and bzip2 (which should be available on most *nix servers), rzip (written by Andrew Tridgell of samba and rsync fame) and finally 7zip. In all cases, fast meant using the -1 switch, and best meant using -9 (n.b. rzip also has a -0 setting, but this didn't yield useful results).

For my tests, I used 3 different SQL files (uncompressed they are 16mb, 48mb and 83mb approx) - they are mostly made up of dumps of drupal sites, although there are a few other databases in there (each one is a mysqldump of all databases on a mysql 5 server). Here are the results:

Compression comparison - sizes in mb
  16mb 48mb 83mb
gzip (fast) 2.19 8.27 19.94
gzip (best) 1.62 6.65 17.36
bzip2 (fast) 1.72 6.36 16.93
bzip2 (best) 1.17 5.14 13.07
rzip (fast) 0.95 5.01 12.00
rzip (best) 0.89 4.65 10.39
7zip 0.82 4.34 9.09
Compression comparison - time in seconds


16mb 48mb 83mb
gzip (fast) 0.309 1.020 2.072
gzip (best) 1.356 4.873 7.343
bzip2 (fast) 5.141 14.918 25.858
bzip2 (best) 7.732 22.026 44.834
rzip (fast) 2.400 5.931 11.582
rzip (best) 2.476 34.277 52.385
7zip 4.284 15.097 25.325

As I'm mainly interested in small compressed files, and don't care about performance or resource usage so much, my conclusion is that 7zip is the best fit for my needs. It's the best of the bunch in terms of producing small files, but still outperforms many of the others in terms of how fast it compresses the SQL files.

7zip also has other advantages; the cross platform port p7zip works great on most linux distros (and Mac OSX, I believe), and the original 7zip application is a windows app - so I can have my backup scripts send copies of database dumps to website owners and be confident they can access the dumps if the need ever arises. (p)7zip can also be pipe-lined (that is, it can take input from standard in and send its output to standard out), so it's pretty flexible in how it can be used in scripts.

A few reflections on the other methods / apps I tested; if performance is more important to you than it is to me, then rzip (at less than best compression ratio) may be worth a look - on its fast setting it did pretty well at making smaller files than gzip or bzip2 could and was not much slower than gzip (and faster than bzip2). Where 7zip does well with cross-platform availability and pipe-lining, however, rzip's not so good.

I should point out that my conclusions here are based on my own criteria (I want small compressed files, and don't care so much what it takes to get them). If your priorities are different, my conclusion may not work for you. In particular my use of wall time to measure the performance of these compression methods is very woolly. For example, I made no measurement of memory consumption, and this could be a significant differentiator if it's a factor that matters to you.

P.S. this post uses a new drupal module I put together which allows you to easily generate google charts from HTML tables. I plan to document this (and possibly release the module) soon. It's based on work by Martin Hawksey which was based on work by Chris Heilmann.