Software and Computing
Software and Computing phone meeting
1-189, EVO, at 16:00 (GMT), duration : 01:00
Time | Talk | Presenter |
---|---|---|
12:00 | Calibration overview for Year9 and other issues ( 00:20 ) 1 file | Gene Van Buren (BNL) |
12:20 | Minuit vertex improvement, current status ( 00:15 ) 0 files | Matthew Cervantes (TAMU) |
12:35 | BeamLine determination using 3D method ( 00:10 ) 1 file | Rosi Reed (UC-Davis) |
12:45 | p+p 500 GeV production request, status & motivations ( 00:15 ) 1 file | Jan Balewski (MIT) |
STAR software installation on SL5.2 - my notes
1. enable EPEL repo in /etc/yum.repos.d/epel.repo
2. yum install cvs openafs openafs-client
Review of calibration issues, tasks and plans
1-189, internal, at 19:00 (GMT), duration : 03:00
Chair: Jerome
Purpose:
- Review calibrtaion issues for Run 9
- Review of the calibration status and overview for Year8, Year7 (h+/h- and plan forward
- Plan and activities for the next 6 months to a year [general]
- Analysis meeting strategy / presentation and expectations
Internal only.
Meeting will be interrupted from 15:30 -> 16:30 by a STAR/BNL group meeting.
Reconstruction and simulation issues
1-189, internal, at 18:00 (GMT), duration : 01:00
Chair: Jerome
Invited: Yuri and Victor
Purpose:
- Review activities and progress in both areas
-
Timeline would be next 6 months to a year.
- Review and discuss overlaps between the two areas
Meeting is internal.
CSW4DB, status and path forward
1-189, HighSpeed conference bridge, at 18:00 (GMT), duration : 02:00
Meeting was in two parts. The first hour with Mark Green from Tech-X and the second internal.
User and computer service, progress
1-189, internal, at 20:00 (GMT), duration : 01:00
Chair: Jerome
Software and Computing phone meeting
1-189, HighSpeed conference bridge, at 16:00 (GMT), duration : 01:00
Time | Talk | Presenter |
---|---|---|
12:00 | SPIN PWG request for pp 500GeV ( 00:15 ) 2 files | Joseph Seele (MIT) |
12:15 | Update on the p and n side cluster in SSD ( 00:20 ) 1 file | Jonathan Bouchet (KSU) |
12:35 | iSCSI transfer, progress update ( 00:15 ) 1 file | Matthew Ahrenstein (BNL) |
12:50 | AOB ( 00:10 ) 0 files | All (All) |
UCM discussions
1-189, internal, at 18:00 (GMT), duration : 01:00
User and computer service, task review
1-189, internal, at 20:00 (GMT), duration : 01:00
Discussion on organization and tasks for the user and computer support team.
Review of plan until September/October.
Software and Computing phone meeting
1-189, HighSpeed conference bridge, at 16:00 (GMT), duration : 01:00
Time | Talk | Presenter |
---|---|---|
12:00 | SUMS updates and new release ( 00:30 ) 1 file | Levente Hajdu (BNL) |
12:15 | Database development progress ( 00:20 ) 1 file | Dmitry Arkhipkin (BNL) |
12:35 | Overview of recent production and library release ( 00:15 ) 0 files | Lidia Didenko |
12:50 | AOB ( 00:10 ) 0 files | All (All) |
Database development follow-up
1-189, internal, at 16:00 (GMT), duration : 01:00
Invited: Dmitry, Wayne, Gene
Attendees: Jerome (chair), Dmitry, Wayne, Leve, Gene
The meeting started with miscellaneous topics such as Leve bringing the existence of RAM disk at 1000$ for 32 GB.
Dmitry presented slides (attached). General observations included:
- LB – missing a RR on low load (this should really be implemented)
- FC – most benefit if increase key buffer size
- Can different jobs alter the way we measure the cache speed / enhancement?
Specific discussion:
Discussing the FileCatalog, it was also pointed that MySQL query optimizer sometimes skips the key read (skips because sequential read of the table is believed to be faster than random read) and hence, some select may not use the index at all.
Clearly, the FC is disk speed limited and one point would be to attempt to solve this with more beefy nodes with optimal IO.
Also discussed of UPDATE race conditions observed in the FC. Dmitry stated that there may be a possible fix for the UPDATE lock in recent minor revision of MySQL. Tricky to upgrade the Master as share some service with the offline DB but the idea was to perhaps, add a slave + update if it can be mixed. Mix was not certain / guaranteed.
Jerome asked if there were any benefit in turning on the slow query logs. In principle, we know what they are and Dmitry also noted that "slow" is defined by threshold (with a default low value of a few 10 seconds) and since most queries are mnts long, all queries may be returned. Thought we will think of this matter again whenever we would have improved the IO speed (which seem to be the main problem at this time).
Wayne did spec a node – so far close to 7k$ - [atime disabled as well]
There were lots more from the slides.
Recommended for the path forward:
- Get a performance test suite really settled (comparison as we go - it seems we are there but we need to compare between configurations and and hardware)
- We saw that CPU is not important ... whenever spec-ing a node, we should be conservative with the CPU speed (twice faster is not better so, let's not push)
- What is the network overall performance? If we speed up IO and response of the DB, would we saturate the network? Dmitry presented some results of that and there is food for thoughts.
- We need to assess the API overheads soon enough too ... For now, the IO (cache and hardware reshape) seem to be a big gain but we should not ignore the performance gain on the API side and should also consider this soon.
- Project to be done by July
- Concentrate first on db06, db11
- We make a two phases project - first, look at those two nodes and test in-place. We test until the end of the run
- The second phase would be after the run – we proceed with a rolling upgrade (or rolling phase-in) as nodes are available
- We try to get it all done before T=(end-of-the-run+2 months) which is a typical rule-of-thumb for calibration and start of production. This would give a generous timeline of all db replaced by the RCF nodes by September.
- Another thing to do in parralele, get a (or more) beefy 8 GB mem within 2 month as well
- General though was to try to get beefy nodes for the FileCatalog but it was not all clear what to do there. Wayne pointed after the meeting that if so, what is the plan for the Web-Server and a datbase as redundant swapable?
- Answer is: it would be an over-kill with the FileCatalog but as it stands, and considering Dmitry's results, the avenue of getting 8 GB mem nodes + good IO is a way for the FC with a low number of node purchase requirement (FC need 1 master and 2 slaves at the moment)
- Pricing is the key
- General goal would be (if we go that route) to have the new web server not before September (but also not much after)
Dmitry pointed that for the DB-slave, there may not be enough space – thinking of a TB storage.
- Jerome stated that we will NOT purchase addiitonal storage for backup at this stage as this overalps (perhaps) with Matthew's project of providing an online file-server like capability. We do not want to disperse in all directions.
- suggested to leverage Legato AND pushing snapshot into HPSS and revisit this later
UCM discussions
2-187, at 18:00 (GMT), duration : 01:00
09W23
Dell PowerEdge 1750 setup for offline DB slaves
Notes on configuring the disks in Dell PowerEdge 1750s as offline database slaves and online Linux pool nodes. (Generally applicable to most uses of software RAID in Linux, but most of t
Software and Computing phone meeting
1-189, HighSpeed conference bridge, at 16:00 (GMT), duration : 01:00
Time | Talk | Presenter |
---|---|---|
12:00 | SVT simulator tuning and embeding ( 00:15 ) 1 file | Stephen Baumgart (Yale) |
12:15 | Geometry differential, an update ( 00:20 ) 0 files | Victor Perevoztchikov (BNL) |
12:35 | NPE analysis & embedding opened discussions ( 00:15 ) 0 files | TBC (TBC) |
12:40 | Dalitz decays in starsim ( 00:10 ) 1 file | Thomas Ullrich (BNL) |
12:50 | AOB ( 00:10 ) 0 files | All (All) |
Software and Computing phone meeting
1-189, HighSpeed conference bridge, at 16:00 (GMT), duration : 01:00
Time | Talk | Presenter |
---|---|---|
12:00 | Geometry differential, a follow-up ( 00:20 ) 4 files | Victor Perevoztchikov (BNL) |
12:20 | AOB ( 00:20 ) 0 files | All (All) |
Post procurement 1 space topology
Following the Disk space for FY09, here is the new space topology and space allocation.
Software and Computing phone meeting
1-189, HighSpeed conference bridge, at 16:00 (GMT), duration : 01:30
Time | Talk | Presenter |
---|---|---|
12:00 | DCA resolution in Run7 (Silicon) ( 00:20 ) 1 file | Jonathan Bouchet (KSU) |
12:20 | TPC alignement in Year9 ( 00:15 ) 1 file | Na Li (IOPP) |
12:35 | TPC Alignement in Year7 ( 00:15 ) 0 files | Hao Qiu (IMPCAS) |
12:50 | AOB ( 00:10 ) 0 files | All (All) |
Preparing TPC Anode HV data for the DB
There are 2 kinds of information about the Anode HVs which are important:
Offline DB performance study
Offline Database is READ-heavy (99% reads / 1% writes due to replication), therefore it should benefit from various buffers optimization, elimination of key-less joins and disk (ram) IO improvement