Software and Computing

Software and Computing phone meeting

2009-07-01 12:00
2009-07-01 13:00
Etc/GMT-5
Wednesday, 1 July 2009
1-189, EVO, at 16:00 (GMT), duration : 01:00
TimeTalkPresenter
12:00Calibration overview for Year9 and other issues ( 00:20 ) 1 fileGene Van Buren (BNL)
12:20Minuit vertex improvement, current status ( 00:15 ) 0 filesMatthew Cervantes (TAMU)
12:35BeamLine determination using 3D method ( 00:10 ) 1 fileRosi Reed (UC-Davis)
12:45p+p 500 GeV production request, status & motivations ( 00:15 ) 1 fileJan Balewski (MIT)

STAR software installation on SL5.2 - my notes

1. enable EPEL repo in /etc/yum.repos.d/epel.repo
2. yum install cvs openafs openafs-client 

Review of calibration issues, tasks and plans

2009-06-26 15:00
2009-06-26 18:00
Etc/GMT-5
Friday, 26 June 2009
1-189, internal, at 19:00 (GMT), duration : 03:00

Chair: Jerome

Purpose:

  • Review calibrtaion issues for Run 9
  • Review of the calibration status and overview for Year8, Year7 (h+/h-  and plan forward
  • Plan and activities for the next 6 months to a year [general]
  • Analysis meeting strategy / presentation and expectations

Internal only.

Meeting will be interrupted from 15:30 -> 16:30 by a STAR/BNL group meeting.

Reconstruction and simulation issues

2009-06-26 14:00
2009-06-26 15:00
Etc/GMT-5
Friday, 26 June 2009
1-189, internal, at 18:00 (GMT), duration : 01:00

Chair: Jerome
Invited: Yuri and Victor

Purpose:

  • Review activities and progress in both areas
  • Timeline would be next 6 months to a year.

  • Review and discuss overlaps between the two areas

Meeting is internal.

CSW4DB, status and path forward

2009-06-25 14:00
2009-06-25 16:00
Etc/GMT-5
Thursday, 25 June 2009
1-189, HighSpeed conference bridge, at 18:00 (GMT), duration : 02:00

Meeting was in two parts. The first hour with Mark Green from Tech-X and the second internal.

User and computer service, progress

2009-06-24 16:00
2009-06-24 17:00
Etc/GMT-5
Wednesday, 24 June 2009
1-189, internal, at 20:00 (GMT), duration : 01:00

Chair: Jerome

 

Software and Computing phone meeting

2009-06-24 12:00
2009-06-24 13:00
Etc/GMT-5
Wednesday, 24 June 2009
1-189, HighSpeed conference bridge, at 16:00 (GMT), duration : 01:00
TimeTalkPresenter
12:00SPIN PWG request for pp 500GeV ( 00:15 ) 2 filesJoseph Seele (MIT)
12:15Update on the p and n side cluster in SSD ( 00:20 ) 1 fileJonathan Bouchet (KSU)
12:35iSCSI transfer, progress update ( 00:15 ) 1 fileMatthew Ahrenstein (BNL)
12:50AOB ( 00:10 ) 0 filesAll (All)

UCM discussions

2009-06-18 14:00
2009-06-18 15:00
Etc/GMT-5
Thursday, 18 June 2009
1-189, internal, at 18:00 (GMT), duration : 01:00

User and computer service, task review

2009-06-18 16:00
2009-06-18 17:00
Etc/GMT-5
Thursday, 18 June 2009
1-189, internal, at 20:00 (GMT), duration : 01:00

 

Discussion on organization and tasks for the user and computer support team.

Review of plan until September/October.

 

 

Software and Computing phone meeting

2009-06-17 12:00
2009-06-17 13:00
Etc/GMT-5
Wednesday, 17 June 2009
1-189, HighSpeed conference bridge, at 16:00 (GMT), duration : 01:00
TimeTalkPresenter
12:00SUMS updates and new release ( 00:30 ) 1 fileLevente Hajdu (BNL)
12:15Database development progress ( 00:20 ) 1 fileDmitry Arkhipkin (BNL)
12:35Overview of recent production and library release ( 00:15 ) 0 filesLidia Didenko
12:50AOB ( 00:10 ) 0 filesAll (All)

CPU and bulk storage purchase 2009

The assumed CPU profile will be:

Database development follow-up

2009-06-03 12:00
2009-06-03 13:00
Etc/GMT-5
Wednesday, 3 June 2009
1-189, internal, at 16:00 (GMT), duration : 01:00

Invited: Dmitry, Wayne, Gene

 

Attendees: Jerome (chair), Dmitry, Wayne, Leve, Gene

 

The meeting started with miscellaneous topics such as Leve bringing the existence of RAM disk at 1000$ for 32 GB.

Dmitry presented slides (attached). General observations included:

  • LB – missing a RR on low load (this should really be implemented)
  • FC – most benefit if increase key buffer size
  • Can different jobs alter the way we measure the cache speed / enhancement?

Specific discussion:

Discussing the FileCatalog, it was also pointed that MySQL query optimizer sometimes skips the key read (skips because sequential read of the table is believed to be faster than random read) and hence, some select may not use the index at all.

Clearly, the FC is disk speed limited and one point would be to attempt to solve this with more beefy nodes with optimal IO.

Also discussed of UPDATE race conditions observed in the FC. Dmitry stated that there may be a possible fix for the UPDATE lock in recent minor revision of MySQL. Tricky to upgrade the Master as share some service with the offline DB but the idea was to perhaps, add a slave + update if it can be mixed. Mix was not certain / guaranteed.

Jerome asked if there were any benefit in turning on  the slow query logs. In principle, we know what they are and Dmitry also noted that "slow" is defined by threshold (with a default low value of  a few 10 seconds) and since most queries are mnts long, all queries may be returned. Thought we will think of this matter again whenever we would have improved the IO speed (which seem to be the main problem at this time).

Wayne did spec a node – so far close to 7k$ - [atime disabled as well]

 

There were lots more from the slides.

Recommended for the path forward:

  • Get a performance test suite really settled (comparison as we go - it seems we are there but we need to compare between configurations and and hardware)
  • We saw that CPU is not important ... whenever spec-ing a node, we should be conservative with the CPU speed (twice faster is not better so, let's not push)
  • What is the network overall performance? If we speed up IO and response of the DB, would we saturate the network? Dmitry presented some results of that and there is food for thoughts.
  • We need to assess the API overheads soon enough too ... For now, the IO (cache and hardware reshape) seem to be a big gain but we should not ignore the performance gain on the API side and should also consider this soon.
  • Project to be done by July
    • Concentrate first on db06, db11
    • We make a two phases project - first, look at those two nodes and test in-place. We test until the end of the run
    • The second phase would be after the run – we proceed with a rolling upgrade (or rolling phase-in) as nodes are available
    • We try to get it all done before T=(end-of-the-run+2 months) which is a typical rule-of-thumb for calibration and start of production. This would give a generous timeline of all db replaced by the RCF nodes by September.
       
  • Another thing to do in parralele, get a (or more) beefy 8 GB mem within 2 month as well
    • General though was to try to get beefy nodes for the FileCatalog but it was not all clear what to do there. Wayne pointed after the meeting that if so, what is the plan for the Web-Server and a datbase as redundant swapable?
    • Answer is: it would be an over-kill with the FileCatalog but as it stands, and considering Dmitry's results, the avenue of getting 8 GB mem nodes + good IO is a way for the FC with a low number of node purchase requirement (FC need 1 master and 2 slaves at the moment)
      • Pricing is the key
      • General goal would be (if we go that route) to have the new web server not before September (but also not much after)

 

Dmitry pointed that for the DB-slave, there may not be enough space – thinking of a TB storage.

  • Jerome stated that we will NOT purchase addiitonal storage for backup at this stage as this overalps (perhaps) with Matthew's project of providing an online file-server like capability. We do not want to disperse in all directions.
  • suggested to leverage Legato AND pushing snapshot into HPSS and revisit this later

UCM discussions

2009-06-11 14:00
2009-06-11 15:00
Etc/GMT-5
Thursday, 11 June 2009
2-187, at 18:00 (GMT), duration : 01:00

09W23

Dell PowerEdge 1750 setup for offline DB slaves

Notes on configuring the disks in Dell PowerEdge 1750s as offline database slaves and online Linux pool nodes.  (Generally applicable to most uses of software RAID in Linux, but most of t

Software and Computing phone meeting

2009-06-10 12:00
2009-06-10 13:00
Etc/GMT-5
Wednesday, 10 June 2009
1-189, HighSpeed conference bridge, at 16:00 (GMT), duration : 01:00
TimeTalkPresenter
12:00SVT simulator tuning and embeding ( 00:15 ) 1 fileStephen Baumgart (Yale)
12:15Geometry differential, an update ( 00:20 ) 0 filesVictor Perevoztchikov (BNL)
12:35NPE analysis & embedding opened discussions ( 00:15 ) 0 filesTBC (TBC)
12:40Dalitz decays in starsim ( 00:10 ) 1 fileThomas Ullrich (BNL)
12:50AOB ( 00:10 ) 0 filesAll (All)

Software and Computing phone meeting

2009-05-27 12:00
2009-05-27 13:00
Etc/GMT-5
Wednesday, 27 May 2009
1-189, HighSpeed conference bridge, at 16:00 (GMT), duration : 01:00
TimeTalkPresenter
12:00Geometry differential, a follow-up ( 00:20 ) 4 filesVictor Perevoztchikov (BNL)
12:20AOB ( 00:20 ) 0 filesAll (All)

Post procurement 1 space topology

Following the Disk space for FY09, here is the new space topology and space allocation. 

Software and Computing phone meeting

2009-05-13 12:00
2009-05-13 13:30
Etc/GMT-5
Wednesday, 13 May 2009
1-189, HighSpeed conference bridge, at 16:00 (GMT), duration : 01:30
TimeTalkPresenter
12:00DCA resolution in Run7 (Silicon) ( 00:20 ) 1 fileJonathan Bouchet (KSU)
12:20TPC alignement in Year9 ( 00:15 ) 1 fileNa Li (IOPP)
12:35TPC Alignement in Year7 ( 00:15 ) 0 filesHao Qiu (IMPCAS)
12:50AOB ( 00:10 ) 0 filesAll (All)

Preparing TPC Anode HV data for the DB

There are 2 kinds of information about the Anode HVs which are important:

Offline DB performance study

Offline Database is READ-heavy (99% reads / 1% writes due to replication), therefore it should benefit from various buffers optimization, elimination of key-less joins and disk (ram) IO improvement