Database development follow-up

2009-06-03 12:00
2009-06-03 13:00
Etc/GMT-5
Wednesday, 3 June 2009
1-189, internal, at 16:00 (GMT), duration : 01:00

Invited: Dmitry, Wayne, Gene

 

Attendees: Jerome (chair), Dmitry, Wayne, Leve, Gene

 

The meeting started with miscellaneous topics such as Leve bringing the existence of RAM disk at 1000$ for 32 GB.

Dmitry presented slides (attached). General observations included:

  • LB – missing a RR on low load (this should really be implemented)
  • FC – most benefit if increase key buffer size
  • Can different jobs alter the way we measure the cache speed / enhancement?

Specific discussion:

Discussing the FileCatalog, it was also pointed that MySQL query optimizer sometimes skips the key read (skips because sequential read of the table is believed to be faster than random read) and hence, some select may not use the index at all.

Clearly, the FC is disk speed limited and one point would be to attempt to solve this with more beefy nodes with optimal IO.

Also discussed of UPDATE race conditions observed in the FC. Dmitry stated that there may be a possible fix for the UPDATE lock in recent minor revision of MySQL. Tricky to upgrade the Master as share some service with the offline DB but the idea was to perhaps, add a slave + update if it can be mixed. Mix was not certain / guaranteed.

Jerome asked if there were any benefit in turning on  the slow query logs. In principle, we know what they are and Dmitry also noted that "slow" is defined by threshold (with a default low value of  a few 10 seconds) and since most queries are mnts long, all queries may be returned. Thought we will think of this matter again whenever we would have improved the IO speed (which seem to be the main problem at this time).

Wayne did spec a node – so far close to 7k$ - [atime disabled as well]

 

There were lots more from the slides.

Recommended for the path forward:

  • Get a performance test suite really settled (comparison as we go - it seems we are there but we need to compare between configurations and and hardware)
  • We saw that CPU is not important ... whenever spec-ing a node, we should be conservative with the CPU speed (twice faster is not better so, let's not push)
  • What is the network overall performance? If we speed up IO and response of the DB, would we saturate the network? Dmitry presented some results of that and there is food for thoughts.
  • We need to assess the API overheads soon enough too ... For now, the IO (cache and hardware reshape) seem to be a big gain but we should not ignore the performance gain on the API side and should also consider this soon.
  • Project to be done by July
    • Concentrate first on db06, db11
    • We make a two phases project - first, look at those two nodes and test in-place. We test until the end of the run
    • The second phase would be after the run – we proceed with a rolling upgrade (or rolling phase-in) as nodes are available
    • We try to get it all done before T=(end-of-the-run+2 months) which is a typical rule-of-thumb for calibration and start of production. This would give a generous timeline of all db replaced by the RCF nodes by September.
       
  • Another thing to do in parralele, get a (or more) beefy 8 GB mem within 2 month as well
    • General though was to try to get beefy nodes for the FileCatalog but it was not all clear what to do there. Wayne pointed after the meeting that if so, what is the plan for the Web-Server and a datbase as redundant swapable?
    • Answer is: it would be an over-kill with the FileCatalog but as it stands, and considering Dmitry's results, the avenue of getting 8 GB mem nodes + good IO is a way for the FC with a low number of node purchase requirement (FC need 1 master and 2 slaves at the moment)
      • Pricing is the key
      • General goal would be (if we go that route) to have the new web server not before September (but also not much after)

 

Dmitry pointed that for the DB-slave, there may not be enough space – thinking of a TB storage.

  • Jerome stated that we will NOT purchase addiitonal storage for backup at this stage as this overalps (perhaps) with Matthew's project of providing an online file-server like capability. We do not want to disperse in all directions.
  • suggested to leverage Legato AND pushing snapshot into HPSS and revisit this later