Replication

Replication enables data from one MySQL database server (called the master) to be replicated to one or more MySQL database servers (slaves). Replication is asynchronous - your replication slaves do not need to be connected permanently to receive updates from the master, which means that updates can occur over long-distance connections and even temporary solutions such as a dial-up service. Depending on the configuration, you can replicate all databases, selected databases, or even selected tables within a database.

The target uses for replication in MySQL include:

  • Scale-out solutions - spreading the load among multiple slaves to improve performance. In this environment, all writes and updates must take place on the master server. Reads, however, may take place on one or more slaves. This model can improve the performance of writes (since the master is dedicated to updates), while dramatically increasing read speed across an increasing number of slaves.

  • Data security - because data is replicated to the slave, and the slave can pause the replication process, it is possible to run backup services on the slave without corrupting the corresponding master data.

  • Analytics - live data can be created on the master, while the analysis of the information can take place on the slave without affecting the performance of the master.

  • Long-distance data distribution - if a branch office would like to work with a copy of your main data, you can use replication to create a local copy of the data for their use without requiring permanent access to the master.

 

Fixing Slave Replication In A STAR Offline Database Mirror

Mike's trick for fixing an offline database slave when it won't sync with the master is to be carried out as follows:

cd /var/run/mysqld

ls -lh | grep "mysqld-relay-bin"

Find the relay file with the largest number at the end (this is the latest one)

Make sure the first line of the text file "/db01/offline/data/relay-log.info" contains that file

ie:

ls -lh | grep "mysqld-relay-bin"

mysqld-relay-bin.000001

mysqld-relay-bin.000002

so this means that /db01/offline/data/relay-log.info should contain:

"/var/run/mysqld/mysqld-relay-bin.000002"

as the first line

Then login to mysql via "mysql -S /tmp/mysql.3316.sock -p -u root"

and issue the command "slave start;"

then check the status with "show slave status \G"

Slave_IO_Running should show "Yes"

Slave_SQL_Running should show "Yes"

You may need to repeat this procedure if you get an error the first time.

Master/Slave syncronization issues,tools,solutions etc...

This page is dedicated to variety of master/slave syncronization issues, tools and solutions used at STAR. It is made for internal usage, tutorial for offsite facilities would be created later.

Maatkit (formerly MySQL Toolkit) is a bunch of Perl scripts, designed to make MySQL daily management tasks easier. STAR makes use of 'mk-table-checksum' to get lock-free reliable checksumming of FileCatalog and Offline database tables, and 'mk-table-sync' to fix inconsistent tables on slaves.

I. FileCatalog replication checksums

Master server: robinson.star.bnl.gov:3336
mk-table-checksum uses "repl.checksum" (database.table) to store checksum results. If FC slave was added after Nov.2008, then such table should be created manually. If not done exactly as described, checksums will not work properly, and slave will stop replicating with error message stating that "table repl.checksum does not exist".  

CREATE DATABASE repl;
CREATE TABLE checksum (
     db         char(64)     NOT NULL,
     tbl        char(64)     NOT NULL,
     chunk      int          NOT NULL,
     boundaries char(64)     NOT NULL,
     this_crc   char(40)     NOT NULL,
     this_cnt   int          NOT NULL,
     master_crc char(40)         NULL,
     master_cnt int              NULL,
     ts         timestamp    NOT NULL,
     PRIMARY KEY (db, tbl, chunk)
  );


HOW-TO calculate checksums for all FileCatalog tables on Master, results and queries will propagate to slaves :

shell> nohup ./mk-table-checksum --emptyrepltbl --replicate=repl.checksum --algorithm=BIT_XOR \
h=robinson.star.bnl.gov,P=3336,u=root,p=XYZ -d FileCatalog_BNL >> checkrepl.log.txt 2>&1 

At this moment (Nov.2008), it takes about 20 min to checksum all FC tables.
HOW-TO show if there are any differences between master (robinson) and slaves (duvall, brando), when checksumming process is finished :

shell> mysql -u root --host duvall.star.bnl.gov --port 3336 -pXYZ -e "USE repl; SELECT db, 
 tbl, chunk, this_cnt-master_cnt AS cnt_diff, this_crc <> master_crc 
OR ISNULL(master_crc) <> ISNULL(this_crc) AS crc_diff FROM checksum 
WHERE master_cnt <> this_cnt OR master_crc <> this_crc OR 
ISNULL(master_crc) <> ISNULL(this_crc);" > duvall.tablediff.txt


shell> mysql -u root --host brando.star.bnl.gov --port 3336 -pXYZ -e "USE repl; SELECT db, 
tbl, chunk, this_cnt-master_cnt AS cnt_diff, this_crc <> master_crc 
OR ISNULL(master_crc) <> ISNULL(this_crc) AS crc_diff FROM checksum 
WHERE master_cnt <> this_cnt OR master_crc <> this_crc OR 
ISNULL(master_crc) <> ISNULL(this_crc);" > brando.tablediff.txt


HOW-TO show the differences (by row) between master and slave for selected table. In the example below, changes for table "TriggerWords" is requested for duvall.star.bnl.gov :

shell> ./mk-table-sync --print --synctomaster --replicate repl.checksum 
h=duvall.star.bnl.gov,P=3336,u=root,p=XYZ,D=FileCatalog_BNL,t=TriggerWords

To make an actual syncronization, one should replace "--print" option with "--execute" (please make sure you have a valid db backup first!). It is often good to check how many rows need resync and how much time it would take, because it might stop replication process for this time.

 

 

 

Replication monitoring

Following tools are available for replication check:

1. Basic monitoring script, which allows to see what is going on with replication at this very moment. It does not send any notifications and cannot be thought as complete solution to replication monitoring. Mature tools (like Nagios) should be used in near future.

2. Server-side replication check made by M.DePhillips. This script checks which slaves are connected to master (robinson) on daily basic. List of connected slaves is compared to the known slave list (stored at dbAdmin database at heston.star.bnl.gov).

Server list and various parameters

Here will be a server list (only those servers participating in various replication scenarios)

1. FileCatalog master db server : fc1.star.bnl.gov:3336, ID: 880853336
slaves:

  • fc2.star.bnl.gov:3336, ID: 880863336
  • fc3.star.bnl.gov:3336, ID: 880873336 [ backup server, not used in round-robin ]
  • fc4.star.bnl.gov:3336, ID: 880883336

2. Offline master db server: robinson.star.bnl.gov:3306, ID: 880673316
slaves:

  • db01.star.bnl.gov:3316, ID:  880483316
  • db02.star.bnl.gov:3316, ID: 880493316  - FAST rcf network
  • db03.star.bnl.gov:3316, ID: 880503316 [ 23 Jul 09, new ID: 881083316 ] - FAST rcf network
  • db04.star.bnl.gov:3316, ID: 880513316 [ 14 Jul 09, new ID: 881043316 ] - FAST rcf network
  • db05.star.bnl.gov:3316, ID: 880523316 [ 16 Jul 09, new ID: 881053316 ] - FAST rcf network
  • db06.star.bnl.gov:3316, ID 881023316  [old ID: 880533316 ]
  • db07.star.bnl.gov:3316, ID: 880543316
  • db08.star.bnl.gov:3316, ID: 880613316
  • db10.star.bnl.gov:3316, ID: 880643316 [ 20 Jul 09, new ID 881063316 ]
  • db11.star.bnl.gov:3316, 881033316 [ old ID: 880663316 ]
  • db12.star.bnl.gov:3316, ID: 881013316, as of Apr 28th 2010, ID: 880253316
  • db13.star.bnl.gov:3316 (former bogart), ID: 880753316 [ 22 Jul 09, new ID 881073316 ]
  • omega.star.bnl.gov:3316, ID: 880843316 [3 Dec 09, backup server]
  • db14.star.bnl.gov:3316, ID: 541043316
  • db15.star.bnl.gov:3316, ID: 541053316
  • db16.star.bnl.gov:3316, ID: 541063316
  • db17.star.bnl.gov:3316, ID: 541073316
  • db18.star.bnl.gov:3316, ID: 541083316
  • onldb.starp.bnl.gov:3601, ID: 600043601 | resides in ONLINE domain
  • onldb2.starp.bnl.gov:3601, ID: 600893601 | resides in ONLINE domain

External slaves :
Seo-Young Noh <rsyoung@kisti.re.kr>? KISTI (first server) : 990023316
Xianglei Zhu <zhux@tsinghua.edu.cn> KISTI (second server) : 991013316
Richard Witt <witt@usna.edu> : hvergelmir.mslab.usna.edu, 992073316 | USNA
Kunsu OH <kunsuoh@gmail.com> : 992083316 | Pusan National University
Peter Chen <pchen@sun.science.wayne.edu> : rhic23.physics.wayne.edu, 991063316| Wayne State University
Matt Walker <mwalker@mit.edu> : glunami.mit.edu, 992103316 | MIT
Jeff Porter <rjporter@lbl.gov>, pstardb1.nersc.gov, 1285536651 | PDSF
Douglas Olson <dlolson@lbl.gov> :  992113316 | LBNL

Server ID calculation :
SERVER_ID is created from the following scheme: [subnet][id][port], so 880613316 means 88 subnet, 061 id, 3316 port...
subnet '99' is reserved for external facilities, which will have something like 99[id][port] as server_id.

 

 

Setting Up A Star Offline Database Mirror

Set up an environment consistent with the STAR db environments.

  1. Create a user named 'mysql' and a group 'mysql' on the system.
  2. Create a directory for the mysql installation and data.  The standard directory for STAR offline is /db01/offline
  3. In /db01/offline place the tarball of mysql version 5.0.51a (or greater) downloaded from here
  4. Untar/zip it.  Use "--owner mysql" in the tar command, or recursively chown the untarred directory so that mysql is the  owner (eg. 'chown mysql:mysql  mysql-5.0.51a-linux-i686-glibc23').  Delete the tarball if desired.
  5. Still in /db01/offline, create a soft link named 'mysql' that points to the mysql directory created by the untar (ln -s mysql-5.0.51a-linux-i686-glibc23 mysql').  /db01/offline should now have two items:  mysql (a link), and mysql-standard-5.0.51a-linux-i686-glibc23.
  6. Create a directory /var/run/mysqld and set the owner to mysql, with 770 permissions.  (This may be avoideable with a simple change to the my.cnf file (to be verified).  If this change is made to the "official" my.cnf, then this step should be removed.)

Next, acquire a recent snapshot of the offline tables and a my.cnf file

There will be two files that need to be transfered to your server; a tarball of the offline database files (plus a handful of other files, some of which are discussed below) and a my.cnf file.  

If you have access to RCF, a recent tar of the offline databases can be found here: /star/data07/db/offline_08xxxxx.tar.gz.  Un-tar/zip the tarball (with "--owner mysql") in /db01/offline on your server to get a 'data' directory.  (/db01/offline/data). 

The my.cnf file can be found in /star/data07/db/my.cnf, and should be put in /etc/my.cnf.  Before your server is started, either the master information should be commented out, or the server-id should be adjusted to avoid clashing with an existing production slave.  Contact Dmitry Arkhipkin to determine an appropriate server-id.

Start the server

At this point you should be able to start the mysql server process.  It should be as simple as the following command:

cd /db01/offline/mysql && ./bin/mysqld_safe --user=mysql &

 

The data directory

The databases

You will now find a complete snapshot of STARs offline data. We use MYSQLs MYISAM storage engine which stores its tables in three files under a directory that is the database name. For example, for database Calibrations_ftpc you will see a directory of the same name. If you descend into that directory you will see three files for each table name. The files are MYD which contain the data, MYI which contain the indexes and FRM which describes the table structure.

Other files

  • master.info - This files contains information about the slaves master (aptly named). This includes who the master is, how to connect, and what was the last bit of data passed from the master to the slave. This is stored as an integer in this file.
  • You will also find log files, bin files and their associated index files.