STAR in-house backup considerations
Background
I am setting out to explore possible backup systems for use in the STAR online domain and possibly also in the offline servers and desktop PCs. This is made more feasible of late by our growing familiarity and comfort with Ceph/CephFS and its potential for a relatively large and safe storage pool at relatively low cost.
Historically we have relied somewhat haphazardly on ITD backups and rsync-based scripts. Both have limitations and frustrations. Though rsync scripts are likely to be in use for some time (coming in handy for the quickest recovery in some scenarios), in the simplest uses they lack long term retention - a user mistake that deletes or alters important content will be propagated to the copy the next time rsync runs.
Meanwhile, ITD services have introduced frustrating configuration limitations and communication problems (both at the personnel and network levels). For instance, we would like to take snapshots at the end of each run with a 6 (or more) month retention time for the online systems, but getting ITD to do this has been troublesome. Backing up multi-TB filesystems over the campus network for traditional full backups will take many hours, and may negatively impact the performance of the systems and network paths involved. Our control and insight into what is happening with the ITD backups has generally been quite poor over the past couple of years. Though it will introduce headaches of its own for us to deal with, perhaps we will be better served with a solution that we manage directly ourselves.
Our needs are classic backup needs: reliable, low performance impact on the systems being backed up, configurable backup sets, restorability to arbitrary machines, moderate backup retention periods (very rarely more than 1 year), without a lot of administrative effort required. Deduplication would be a very nice feature to reduce backend storage needs and network transfers. Capabilities for bare metal restoration would also be a plus. While primarily intended for Linux systems, it would be useful to include Windows machines within the same framework.
Sampling of non-commercial candidates (several have commercial/enterprise versions or support services as well):
System | Windows client? | GUI/Web interface? | Deduplication? | Support for Typical scheduling and Retention Policies/Settings? | DB required? | Encrypted backups supported? | Miscellaneous |
---|---|---|---|---|---|---|---|
BackupPC | backs up windows over SMB, has limitations... | Web | Yes, though for some reason the buzzword is not used, but instead "clever pooling scheme" - sounds like file-level & cross-client | Yes | No | No | No client software per se - relies on common utilities such as ssh, rsync, SMB |
Amanda | Yes | No (though the paid Enterprise version has a webUI) | No? | Close, but a little unusual... | No | Yes (client-side or server-side) | |
Attic | No | No | Yes, seemingly at a block level, but not clear if it works across repositories/clients, or just within each client | Not directly, but can be effectively done with command parameters in periodically executed scripts | No | Yes |
|
rsnapshot | No, but can work with rsync server such as cwRsync | No | Yes; file level (per client) using hardlinks | Yes, though scheduling is done in cron | No | No | No active development - last release was August 2008 |
Bacula | small cost for binary distributions of recent versions | Web and GUI (also Webmin module) | Yes; file level, potentially cross-client. Base backups must be explicitely made and declared for subsequent backup sets, so not ideal or automatic like some fancier systems. | Yes | Yes | Yes, client-side, with master key support |
|
BareOS | Yes |
Web and GUI (also the Webmin module for Bacula looks like it has been updated to work with BareOS)
|
Yes; file level, potentially cross-client. Base backups must be explicitely made and declared for subsequent backup sets, so not ideal or automatic like some fancier systems. | Yes | Yes | Yes, client-side, with master key support |
|
(This is an incomplete list of course - there are many others to be found, but those above seem to have the most users and/or interesting features that make them attractive.)
Some other characteristics of possible interest: compression, encryption, data format used, user-level restores, bare metal restore possibilities. Jerome noted that encryption might be of use in the case of using a filesystem (eg. CephFS) that is shared across hosts, since that could expose the contents of backups to many users. In the case of CephFS, we are currently [early May 2015] uncertain what the capabilities are for granular access controls or for the use of multiple distinct CephFSes within a single Ceph Object Storage installation.
The first stab: Bacula
Bacula has long been a well-regarded, FOSS backup-system with a low to moderate difficulty. Many sysadmins of small to medium size installations have use it over the years as a low cost, feature-rich solution, and it seemed like a natural place to start. I began this exploration with Bacula 5.0.0 simply because it was readily available for RHEL/CentOS/SL 6, and I thought it would work with Windows clients as well. To cut to the chase - version 5 is rather old, but it seemed to work fine for Linux backups with a good Webmin module for administration. However the Windows client did not work terribly well, though that might have been because I was mixing versions between the Windows client and the Linux master. (The biggest issue was the Windows backups would never finish - the backups would execute and save the data, but would never change to a completed state, preventing subsequent backups from running.) All in all, between the old Bacula version, the Windows backup behaviour and the discovery of the Bareos fork as an alternative, I gave up on Bacula.
From the consideratons above, Bareos and BackupPC are of most interest for test deployments. Though I have yet to use both, I have some initial impressions from the documentation. It appears BackupPC's biggest advantage over Bareos is in the ease of using deduplication. BackupPC's web interface also appears to be nicer than Bareos's bat or Webmin UIs. Bareos's client however is likely easier to deploy than setting up native services on clients and maintaining them over time.
Groups:
- wbetts's blog
- Login or register to post comments