file systems with on-the-fly compression

Looking into filesystems for Linux that support on-the-fly, transparent compression.  Reducing the space used for a given set of data/files is naturally of interest - it directly means less raw storage needs to be purchased.  Perhaps less obvious, it *could* improve I/O performance by trading some slow disk I/O for fast CPU operations.  My intuition (fwiw) is that read operations could gain a lot - built into that intuition is an expectation that decompression is likely faster than compression for common compression schemes.   So write-once, read-many data sets could benefit from this the most.  (That's just my initial guess - test results to come below should provide quantitative evidence one way or another, though others have also made performance tests - have to look into that as well.)

Candidates for testing are ZFS and BTRFS.  Each has at least two compression libraries available (though not the same two (TBC), typically with one giving better compression at the cost of additional CPU usage).  Both filesystems have lengthy feature lists beyond compression.  For instance, both have deduplication at some level, which can further reduce the on-disk size of data sets and could presumably increase cache hit rates, benefitting read performance.  Both projects also serve as volume managers (ZFS being more full featured as such) and have RAID capabilities (replacing mdadm for instance).  As is always the case in filesystem testing, there are a large number of parameters in play.

ZFS is a port from OpenSolaris, while BTRFS appears to have started life in Linux land.

ZFS is at a glance straightforward to use with Scientific Linux 6 and 7.  On SL 7, start with 'yum install zfs-release' to add the zfs yum repositories, then 'yum install zfs'.  (The epel repo is required for zfs-release - if nothing else, it is a source of dkms.)

BTRFS is even easier apparently - I made a minimal SL7 installation with no customization (except adding developer tools) and btrfs-progs was installed.

Getting a taste:
[root@sl7beta ~]# mkfs.ext4 -m 2 /dev/sdb
[root@sl7beta ~]# mount /dev/sdb /ext4
[root@sl7beta ~]# mkfs.btrfs -O ^extref /dev/sdc
[root@sl7beta ~]# mount /dev/sdc /btrfs
[root@sl7beta ~]# zpool create -m /zfs test0 sdd

Note that the zfs command automatically mounts a filesystem under /zfs.

[root@sl7beta ~]# df -h /zfs /ext4 /btrfs
Filesystem      Size  Used Avail Use% Mounted on
test0           899G     0  899G   0% /zfs
/dev/sdb        917G   77M  899G   1% /ext4
/dev/sdc        932G  512K  930G   1% /btrfs

Of course, at this point, everything is default - no attempt to enable compression, encryption, volume management, redundancy or any fine tuning (except the minor ext4 tweak to reduce the reserved space).  BTRFS is already winning it seems - it has over 3% more space available than ZFS.  Whether that's meaningful remains to be seen...

Compression in ZFS and BTRFS

Note that a given filesystem may have different files compressed with different algorithms (or even uncompressed files mixed with compressed files).  Compression is done per extent in BTRFS.  (ZFS apparently does not have extents, so I assume compression is done file-by-file).  Compression is not applied retroactively - a filesystem can be created with no compression, used for a while and then have compression enabled.  The compression of existing files will not be changed, but they will still be useable.  Furthermore, in some cases, the filesystem may "decide" not to compress files, even if compression is enabled (at least in BTRFS - it attempts to compress the first portion of a file and if it does not get smaller, then compression for that file is disabled).

Compression libraries for BTRFS are described here:  https://btrfs.wiki.kernel.org/index.php/Compression The compression options are Zlib level 3 (the default) and LZO.


enabling lz4 compression on zfs:
zfs set compression=lz4 pool0/zfs0

btrfs mount without compression:

/dev/sdc on /btrfs type btrfs (rw,relatime,seclabel,space_cache)

BTRFS mount with default compression (mount -o compress /dev/sdc /btrfs):

/dev/sdc on /btrfs type btrfs (rw,relatime,seclabel,compress=zlib,space_cache)

https://btrfs.wiki.kernel.org/index.php/Compression

Additional reading:

A serires of blog entries starting here is very informative about ZFS (albeit perhaps little old, and debian centric):

https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux/


There is a very relevant and fairly recent performance study (alas, without compression) of ZFS, BTRFS, XFS, and EXT4 that is an undergraduate's thesis project for Spring 2015:  https://www.diva-portal.org/smash/get/diva2:822493/FULLTEXT01.pdf .  In addition to simple filesystems, the tests include use of ZFS and BTRFS RAID features comparted to mdadm used with XFS and EXT4.  Grossly simplifying the results, ZFS performance in this paper is generally worse than the other filesystems, with a few exceptions.  BTRFS in almost all cases beats ZFS.


Another set of performance tests of various file systems is below (from early 2015) (be sure to visit pages 2-11).  This is not IOzone, but is instead based on several near-real-world usage scenarios, particularly for storing virtual machine images, but also with PostgreSQL and file-server-like loads.   I have not yet read it carefully, but I think it does not include any compression.  BTRFS does very poorly in these tests.

http://www.ilsistemista.net/index.php/virtualization/47-zfs-btrfs-xfs-ext4-and-lvm-with-kvm-a-storage-performance-comparison.html


Test results are being kept here:
 
https://drupal.star.bnl.gov/STAR/blog/wbetts/file-systems-fly-compression-part-2

Aside on metric monitoring: 

Since we will want monitoring of some system performance metrics, we will want Ganglia or something similar.  Assuming it doesn't take terribly long, I will try to get a Grafana/InfluxDB system working.  An open question at this point is which collector(s) to use (see for instance
http://graphite.readthedocs.org/en/latest/tools.html though that is a list for collectors that work with Graphite rather than InfluxDB - probably a lot of overlap...)

Grafana and InfluxDB package installations:
yum install https://grafanarel.s3.amazonaws.com/builds/grafana-2.1.3-1.x86_64.rpm
yum install http://influxdb.s3.amazonaws.com/influxdb-0.9.4.2-1.x86_64.rpm

Grafana listens on tcp port 3000 by default and speaks http, while Influx is listening on 8083, 8086 and 8088.  (Note, none of this is open in the local system firewall, at least initially).

Getting Grafana to work was not as quick as I'd hoped and expected.  There are many how-tos and walkthroughs available, but many involve older components (especially influxdb-0.8.x, which is configured differently than influxdb-0.9.x) or they simply didn't work.  It could be worth coming back to this after some time, but I am going to put it aside for the time being.