IOzone tests

Aug 14 update - I am running tests on new hardware with much (?) better hardware than was used in the tests below.  I will probably put the results in a different location (search for R610 in drupal?)

 

First, a note -- There is some duplication in testing, because I reran all the original tests (which did not include small file operations on large files) using full coverage in the second test round.   For what it's worth, I have not seen any surprises in this region. 

Attached Excel files include test results for the test filesystems, using the IOzone's default test suite with files up to 4GB.  All test results are plotted in the attached files, however there is no attempt to make the vertical scales them identical, so be sure to check the vertical scales before making any comparisons!  I may attempt to add a plotting routine that will find the maximum within all test results (eg, maximum of all Writer tests, maximum of all Reader tests, etc.) and plot all the individual test results with the same maximum (ie. all Writer graphs would have the same vertical scale and coloring), so the graphs can be directly compared without having to look at the vertical scaling, but this is a bit more work.

Typical commands are:

  • /opt/iozone/bin/iozone -Ra -g 4G -b eastwood-hda-iozone-4G.xls
  • /opt/iozone/bin/iozone -Raz -g 4G -b eastwood-hda-iozone-full-4G.xls
  • /opt/iozone/bin/iozone -Raz -g 4G -e -b eastwood-hda-with_flush-iozone-full-4G.xls

 

The meanings of the file name components:

"eastwood" or "newman" are the hostanmes

"hdX" or "raidX" are the device names.  hda, hdg, hdh, raid0 and raid5 have ext3 filesystems, while hdc, hde and hdf have ext2.

"full" in the file name indicates test coverage throughout the test range.  If "full" isn't in the file name, then the region of small operations on large file sizes was not tested.

"with_flush" indicates -e was used.

"4G" means the maximum file size tested.

"1GBRAM" means the system's RAM was reduced from 2GB to 1GB during the test.  (NB -- these tests are underway as I write this.)

 

I have tried to make performance comparisons one by one between various filesystems and I'll describe some findings of the individual comparisons, followed by some summary thoughts.  <Need to update this section more carefully>

Comparing ext3 with ext2:

eg, eastwood's hda (ext3/system disk) vs eastwood's hdc (ext2) or

eastwood's hde (ext2) vs eastwood's hdg (ext3) or

eastwood's hdf (ext2) vs eastwood's hdh (ext3):

 

Read performance is essentially indistinguishable, with a few anomolous variations here and there.

Writing to ext2 is almost universally faster than writing to ext3, which is to be expected because of the overhead to keep the journal on ext3.  Somewhat surprising to me, in writing small files (in which most or all of the work is done in memory and flushed to disk later), ext2 writes could be 1.5-2.5 times faster than ext3 writes.  As the file size gets larger (exceeding the system's available RAM for caching/buffering), ext2's writing advantage diminishes to only about ~15-25% for random writes and further down to about 10% for linear writes.  Writing a bunch of small chunks to a large file is less efficient, especially when a journal is involved, than writing the same data in fewer large chunks  When issuing a lot of write commands, ext2 will gain more over ext3.

 

Comparing Master to Slave on a single IDE channel:

eg. eastwood's hde (a "master" with ext2) vs eastwood's hdf (a "slave" with ext2) or

eastwood's hdg ("master"/ext3) vs eastwood's hdh ("slave"/ext3):

hde vs hdf is essentially indistinguishable in all tests.  This is as expected -- the "master" and "slave" designations are not really meaningful terms anymore (and in fact, the terms are no longer used in recent ATA specifications).  hdg has a slight (~5%) edge over hdh in most disk-bound operations -- I'm going to dismiss this as insignificant for the time being.

 

Comparing eastwood to newman:

 eg. eastwood's hda vs newman's hda:

 Since the machines are identical (or very, very nearly so), little or no variation is expected, but it doesn't seem to have worked out that way... For the large file sizes (disk-bound operations), write performances are indistinguishable (if anything, eastwood has a slight edge), but for some reason reads on newman were consistently faster than on eastwood, by 20-25% (maxing out at ~55MB/sec compared to ~45MB/sec).  I don't have any explanation for this.  In fact, on eastwood, writing was faster than reading in comparable tests!  This is certainly a surprising result...

Comparing Controller to Controller on eastwood:

eg., eastwood's hda (Intel 82801CA controller onboard) vs. eastwood's hdg (Promise Technology PDC20268 PCI card) or

eastwood's hdc (Intel 82801CA controller onboard) vs eastwood's hde (Promise Technology PDC20268 PCI card) :

(caveats about this comparison -- hda is a system disk (/), so may have some slight contention with system operations during the test, and the disks on the two different controllers are not exactly the same models -- the major model numbers are all the same, but the minor revision numbers are different.  I couldn't find any documentation about the differences between the minor versions.)

The disks on the Promise Controller are consistently faster than those on the Intel controller.  Typical performance comparison is 40-50MB/sec on the Intel controller vs 60-70 MB/sec on the Promise controller.  To investigate if this is a controller difference, or a difference in the minor disk versions, I could swap some disks around and see if the performance stays with the Controller, or follows the disk around.  If you read this and would like to see this tried, let me know, otherwise I won't give it a high priority. :-)

 

Comparing single disk to RAID0 (with two disks):

eg., eastwood's hdh vs newman's raid0 (Why cross machines instead of comparing newman-hda to newman-raid0?  Because as we saw on eastwood (above), the drives on the Intel controller are consistently slower than the drives on the Promise controller.  The RAID0 array on newman consists of two drives on the Promise controller, so it seems better to compare it to eastwood drives on the Promise controller, rather than a drive on the Intel controller on newman.)

The RAID0 array is 50-100% faster in almost all cases, pretty much as one would expect, when both disks are able to be issued commands simultaneously.  The overhead of software RAID in this case appears negligible.  Where the advantage is least is when the RAIDed disks are not necessarily accessible simultaneously, because subsequent accesses may be on a single drive.  An extreme example of this appears to be in the stride-read results using small accesses, where the single disk is actually faster.  (Stride reading is reading a chunk of size X, seeking ahead Y bytes, reading X bytes, seeking Y bytes again and repeating.)  For certain values of X and Y (and depending on the RAID stripe size), reads may all occur on the same disk, negating the RAID0 advantage (or perhaps even giving the single disk the edge, may be the case for a couple of these test results, though the ouperformance of the single disk in these two cases is beyond any explanation I can come up with.  The IOzone documentation does not explain the relation ship between the "chunk size" variable and X and Y.

 

Comparing RAID5 (with three disks) to the rest:

The RAID5 array on newman includes disks on both controllers (specifically hdc on the Intel controller and hde and hdg on the Promise controller) and is also a mixture of minor disk versions, so there's no other filesystem to compare it to that is "fair".  The hdc drive (or Intel controller) might be "crippling" it, or at least acting as a bottleneck.   Compared to eastwood's hdg, the overall performance is relatively close, with the RAID5 array generally having an edge on reading, but falling behind in writing.  I'm unwilling to try to draw any conclusions from these RAID5 test results, and I doubt there is any configuration of disks possible with this particular hardware to form a "good" RAID5 array.  Ideally we should have three (or more) SCSI or SATA drives on a fast PCI (-X, -E, whatever) bus, which is the sort of thing we'd have with any recent server hardware. 

 

Can we do anything else with IOzone and this hardware?

We can look a bit at the effect(s) of parallel/multi-threaded applications on performance.  I have run some tests with multiple threads accessing the disk, which is likely frequently the case with STAR offline database servers.  Some analysis will follow shortly...