Offline databases are :
robinson.star.bnl.gov:3306 - master database
All hosts below are replication slaves.
db16.star.bnl.gov:3316 - production group nodes
db17.star.bnl.gov:3316
db11.star.bnl.gov:3316
db12.star.bnl.gov:3316 - user node pool
db15.star.bnl.gov:3316
db18.star.bnl.gov:3316
db06.star.bnl.gov:3316
db07.star.bnl.gov:3316 - these hosts belong to dbx.star.bnl.gov RR
db08.star.bnl.gov:3316 (visible to users, external to BNL)
db10.star.bnl.gov:3316
db13.star.bnl.gov:3316 - expert user pool
db14.star.bnl.gov:3316
db03.star.bnl.gov:3316 - online 2 offline buffer node
Snapshot factory consists of three scripts (bash+php), located at omega.star.bnl.gov. Those scripts are attached to cron, so snapshots are produced four times per day at 6:00, 12:00, 18:00, 24:00 respectively. Those snapshots are exposed to outer world by special script, located at STAR web server (orion), which allows single-stream downloads of snapshot archive file (lock-file protection) to avoid www overload.
1. omega.star.bnl.gov :
- primary script is /db01/offline_snapshot_copy/scripts/run, which is started four times per day by /etc/cron.d/db-snapshot.sh cron rule;
- resulting file is copied to /var/www/html/factory/snapshot-2010.tgz, to be served to orion via http means;
- each (un)successful snapshot run sends an email to arkhipkin@bnl.gov, so database administrator is always aware on how things are going;
- run script is protected against multiple running instances (lock file + bash signal traps), thus preventing build-up of failed script instances at omega;
2. orion.star.bnl.gov :
- snapshot download script is located at /var/www/html/dsfactory/tmp/get_snapshot.sh, which is started every hour by /etc/cron.d/db-snapshot-factory cron rule;
- get_snapshot.sh is also protected against multiple running instances, + it will not download same file twice (if no changes since last download), thus conserving network traffic and keepin node load low;
- resulting file - actual database snapshot - could be accessed at http://drupal.star.bnl.gov/dsfactory/ link, which has internal protection against multiple downloads: only 1 download stream is allowed, other clients will receive HTTP 503 error, accompanied by explanations message (busy);
The following bash script could be used to safely download database snapshot :
#!/bin/bash
#
# Database snapshot download script (omega to orion)
# If you need help with this script, please contact
# "Dmitry Arkhipkin" <arkhipkin@bnl.gov>
#
set -u
set +e
# defaults
lockfile=/tmp/db-snapshot-factory.lock
EMAILMESSAGE=/tmp/emailmessage.txt
DOWNLOAD_DIR=http://www.star.bnl.gov/dsfactory
DOWNLOAD_FILE=snapshot-2010.tgz
LOCAL_DIR=/tmp/dsfactory
LOCAL_TMP_DIR=/tmp/dsfactory/tmp
LOCAL_NAME=db-snapshot-2010.tgz
SUBJECT="gridified database snapshot 2010 download completed"
EMAIL=arkhipkin@bnl.gov
if ( set -o noclobber; echo "$$" > "$lockfile") 2> /dev/null;
then
trap '/bin/rm -f "$lockfile"; exit $?' INT TERM EXIT
# protected area starts here :
cd $LOCAL_TMP_DIR
echo "This is an automated email message, please do not reply to it!" > $EMAILMESSAGE
START=$(date +%s)
echo "*** database snapshot update started at" `date` >> $EMAILMESSAGE
# Attempt to download the file. --progress=dot:mega is used to prevent
# WGET_OUTPUT from getting too long.
WGET_OUTPUT=$(2>&1 wget -O $DOWNLOAD_FILE --timestamping --progress=dot:mega \
"$DOWNLOAD_DIR/")
# Make sure the download went okay.
if [ $? -ne 0 ]
then
# wget had problems.
# send "FAILED" notification email
echo "*** wget output: $WGET_OUTPUT" >> $EMAILMESSAGE
SUBJECT="gridified database snapshot 2010 download FAILED"
/bin/mail -s "$SUBJECT" "$EMAIL" < $EMAILMESSAGE
echo 1>&2 $0: "$WGET_OUTPUT" Exiting.
exit 1
fi
# Copy the file to the new name if necessary.
if echo "$WGET_OUTPUT" | /bin/fgrep 'saved' &> /dev/null
then
/bin/cp -f "$LOCAL_TMP_DIR/$DOWNLOAD_FILE" "$LOCAL_DIR/$LOCAL_NAME"
# calculate time
END=$(date +%s)
DIFF=$(( $END - $START ))
HRS=$(( $DIFF / 3600 ))
MNT=$(( $DIFF % 3600 / 60 ))
SEC=$(( $DIFF % 3600 % 60 ))
echo "*** database snapshot download took $HRS:$MNT:$SEC HH:MM:SS to complete (TOTAL: $DIFF seconds)" >> $EMAILMESSAGE
# send "SUCCESSFUL" notification email
echo "*** database snapshot $LOCAL_NAME successfully updated." >> $EMAILMESSAGE
/bin/mail -s "$SUBJECT" "$EMAIL" < $EMAILMESSAGE
else
echo "!!! database snapshot $LOCAL_NAME does not need to be updated - same file." >> $EMAILMESSAGE
fi
# end of protected area
/bin/rm -f "$lockfile"
trap - INT TERM EXIT
else
echo "Failed to acquire lockfile: $lockfile."
echo "Held by $(cat $lockfile)"
fi
# end of script
I. Intro
Three nodes tested: my own laptop with home-quality SSD drive, reserve1.starp.bnl.gov (old db node), db04.star.bnl.gov (new node).
II. hdparm results:
extremely basic test of sequential read - baseline and sanity check [passed]
1. Dmitry's laptop (SSD):
/dev/disk/by-uuid/88ba7100-0622-41e9-ac89-fd8772524d65:
Timing cached reads: 1960 MB in 2.00 seconds = 980.43 MB/sec
Timing buffered disk reads: 532 MB in 3.01 seconds = 177.01 MB/sec
2. reserve1.starp.bnl.gov (old node):
/dev/md1:
Timing cached reads: 2636 MB in 2.00 seconds = 1317.41 MB/sec
Timing buffered disk reads: 252 MB in 3.02 seconds = 83.34 MB/sec
3. db04.star.bnl.gov (new node):
/dev/disk/by-uuid/53f45841-0907-41a0-bfa3-52cd269fdfb6:
Timing cached reads: 5934 MB in 2.00 seconds = 2969.24 MB/sec
Timing buffered disk reads: 884 MB in 3.00 seconds = 294.19 MB/sec
III. MySQL bench test results: test suite to perform "global" health check, without any particular focus, single thread..
1. laptop:
connect: Total time: 113 wallclock secs (44.75 usr 16.29 sys + 0.00 cusr 0.00 csys = 61.04 CPU)
create: Total time: 221 wallclock secs ( 7.22 usr 1.96 sys + 0.00 cusr 0.00 csys = 9.18 CPU)
insert: Total time: 894 wallclock secs (365.51 usr 64.07 sys + 0.00 cusr 0.00 csys = 429.58 CPU)
select: Total time: 240 wallclock secs (33.26 usr 3.86 sys + 0.00 cusr 0.00 csys = 37.12 CPU)
2. reserve1.starp.bnl.gov:
connect: Total time: 107 wallclock secs (44.85 usr 14.88 sys + 0.00 cusr 0.00 csys = 59.73 CPU)
create: Total time: 750 wallclock secs ( 6.37 usr 1.52 sys + 0.00 cusr 0.00 csys = 7.89 CPU)
insert: Total time: 1658 wallclock secs (378.28 usr 47.53 sys + 0.00 cusr 0.00 csys = 425.81 CPU)
select: Total time: 438 wallclock secs (48.92 usr 4.50 sys + 0.00 cusr 0.00 csys = 53.42 CPU)
3. db04.star.bnl.gov:
connect: Total time: 54 wallclock secs (18.63 usr 11.44 sys + 0.00 cusr 0.00 csys = 30.07 CPU)
create: Total time: 1375 wallclock secs ( 1.94 usr 0.91 sys + 0.00 cusr 0.00 csys = 2.85 CPU) [what's this??]
insert: Total time: 470 wallclock secs (172.41 usr 31.71 sys + 0.00 cusr 0.00 csys = 204.12 CPU)
select: Total time: 185 wallclock secs (16.66 usr 2.00 sys + 0.00 cusr 0.00 csys = 18.66 CPU)
IV. SysBench IO test results: random reads, small block size, many parallel threads
t
est
DB05.star.bnl.gov, with "noatime" option:
V. SysBench SQL test results: synthetic read-only
Interestingly, new system beats old one by a factor of x7 up to 128 clients, and is only marginally better at 128+ parallel threads..
V. STAR production chain testing (nightly test BFC):
Test setup:
- auau200, pp200, pp500 x 10 jobs each, run using db05 (new node) and db10 (old node, equivalent to reserve1.starp);
- jobs run in a single-user mode: one job at a time;
- every server was tested in two modes: query cache ON, and query cache OFF (cache reset between tests);
- database maker statistics was collected for analysis;
RESULTS:
auau200, old -> new :
qc_off: Ast = 2.93 -> 2.79 (0.1% -> 0.1%), Cpu = 0.90 -> 1.15 (0% -> 0%)
qc_on : Ast = 2.45 -> 2.26 (0.1% -> 0.1%), Cpu = 0.88 -> 0.86 (0% -> 0%)
pp200, old -> new :
qc_off: Ast = 2.72 -> 6.49 (5.59% -> 10.04%), Cpu = 0.56 -> 0.86 (1.36% -> 1.83%)
qc_on : Ast = 2.24 -> 3.20 (4.90% -> 6.49%), Cpu = 0.60 -> 0.55 (1.46% -> 1.32%)
pp500, old -> new:
qc_off: Ast = 2.65 -> 2.41 (15.9% -> 15.4%), Cpu = 0.55 -> 0.79 (5.05% -> 6.2%)
qc_on : Ast = 2.09 -> 1.94 (13.8% -> 14.8%), Cpu = 0.53 -> 0.53 (4.87% -> 4.88%)
Summary for this test:
- as expected, databases with query cache enabled were processing requests faster than databases;
- new node (having query cache on) is performing at the level very similar to old node, difference is not noticable/significant within test margins;
Additional studies: db11 (very new node):