STAR database inrastructure improvements proposal

 

STAR database improvements proposal

D. Arhipkin

 

1. Monitoring strategy

Proposed database monitoring strategy suggests simultaneous host (hardware), OS and database monitoring to be able to prevent db problems early. Database service health and response time depends strongly on underlying OS health and hardware, therefore, solution, covering all aforementioned aspects needs to be implemented. While there are many tools available on a market today, I propose to use Nagios host and service monitoring tool.

Nagios is a powerful monitoring tool, designed to inform system administrators of the problems before end-users do. The monitoring daemon runs intermittent checks on hosts and services you specify using external "plugins" which return status information to Nagios. When problems are encountered, the daemon can send notifications out to administrative contacts in a variety of different ways (email, instant message, SMS, etc.). Current status information, historical logs, and reports can all be accessed via web browser.

Nagios is already in use at RCF. Combined the Nagios server ability to work in a slave mode, this will allow STAR to integrate into BNL ITD infrastructure smoothly.

Some of the Nagios features include:

  • Monitoring of network services (SMTP, POP3, HTTP, NNTP, PING, etc.)

  • Monitoring of host resources (processor load, disk and memory usage, running processes, log files, etc.)

  • Monitoring of environmental factors such as temperature

  • Simple plugin design that allows users to easily develop their own host and service checks

  • Ability to define network host hierarchy, allowing detection of and distinction between hosts that are down and those that are unreachable

  • Contact notifications when service or host problems occur and get resolved (via email, pager, or other user-defined method)

  • Optional escalation of host and service notifications to different contact groups

  • Ability to define event handlers to be run during service or host events for proactive problem resolution

  • Support for implementing redundant and distributed monitoring servers

  • External command interface that allows on-the-fly modifications to be made to the monitoring and notification behavior through the use of event handlers, the web interface, and third-party applications

  • Retention of host and service status across program restarts

  • Scheduled downtime for suppressing host and service notifications during periods of planned outages

  • Ability to acknowledge problems via the web interface

  • Web interface for viewing current network status, notification and problem history, log file, etc.

  • Simple authorization scheme that allows you restrict what users can see and do from the web interface

 

2. Backup strategy

There is an obvious need for unified, flexible and robust database backup system for STAR databases array. Databases are a part of growing STAR software infrastructure, and new backup system should be easy to manage and scalable enough to perform well under such circumstances​

Zmanda Recovery Manager (MySQL ZRM, Community Edition) is suggested to be used, as it would be fully automated, reliable, uniform database backup and recovery method across all nodes. It also has an ability to restore from backup by tools included with standard MySQL package (for convenience). ZRM CE is a freely downloadable version of ZRM for MySQL, covered by GPL license.

ZRM allows to:

  • Schedule full and incremental logical or raw backups of your MySQL database

  • Centralized backup management

  • Perform backup that is the best match for your storage engine and your MySQL configuration

  • Get e-mail notification about status of your backups

  • Monitor and obtain reports about your backups (including RSS feeds)

  • Verify your backup images

  • Compress and encrypt your backup images

  • Implement Site or Application specific backup policies

  • Recover database easily to any point in time or to any particular database event

  • Custom plugins to tailor MySQL backups to your environment

ZRM CE is dedicated to use with MySQL only.

 

3. Standards compliance

  1. OS compliance. Scientific Linux distributions comply to the Filesystem Hierarchy Standard (FHS), which consists of a set of requirements and guidelines for file and directory placement under UNIX-like operating systems. The guidelines are intended to support interoperability of applications, system administration tools, development tools, and scripts as well as greater uniformity of documentation for these systems. All MySQL databases used in STAR should be configured according to underlying OS standards like FHS to ensure effective OS and database administration during the db lifetime.

  2. MySQL configuration recommendations. STAR MySQL servers should be configured in compliance to both MySQL for linux recommendations and MySQL server requirements. All configuration files should be complete (no parameters should be required from outer sources), and contain supplementary information about server primary purpose and dependent services (like database replication slaves).

 

References

http://www.nagios.org/

http://www.zmanda.com/backup-mysql.html

http://proton.pathname.com/fhs/