Software and Computing
Software and Computing Phone Meeting
1-189, EVO, at 17:00 (GMT), duration : 01:00
Time | Talk | Presenter |
---|---|---|
12:00 | Run 11 initial calibration status ( 00:20 ) 2 files | Grant Webb (UKY) |
12:20 | Bug of the week ( 00:20 ) 0 files | All (All) |
12:40 | AOB ( 00:20 ) 0 files | All (All) |
Theory on cosmic rays at high pT
It occurs to me that the topic of RT ticket 2098 (and
Effects of StiNodePars.h change
Joe Seele found some evidence that high pT tracks were being
Configuring AutoQA for subsystems
Configuring AutoQA for subsystems
This document is intended for subsystem expert who configure the AutoQA reference histogram analysis in Offline QA for their subsystem.
201102
Meeting held on 2011/02/23 to discuss the budget situation and answer employee's questions and address concerns.
Slides provided are attached.
HOW-TO: access CDEV data (copy of CAD docs)
As of Feb 18th 2011, previously existing content of this page is removed.
Software and Computing Phone Meeting
1-189, EVO, at 17:00 (GMT), duration : 01:00
Time | Talk | Presenter |
---|---|---|
12:00 | Sti/VMC update ( 00:20 ) 1 file | Victor Perevoztchikov (BNL) |
12:20 | AOB ( 00:20 ) 0 files | All (All) |
Effects of varying TPC alignment
Towards understanding contributions to the h-/h+ issue seen in global tracks, Maxim & I discussed finding out how well we need to calibration the alignment parameters.
Run 11 preparation/support meeting #12
1-189, EVO, at 19:00 (GMT), duration : 01:00
Software and Computing Phone Meeting
1-189, EVO, at 17:00 (GMT), duration : 01:00
Time | Talk | Presenter |
---|---|---|
12:00 | Embedding overview, path to QM 2011 ( 00:15 ) 1 file | Renee Fatemi (UKY) |
12:15 | Run 11 initial calibration update ( 00:20 ) 1 file | Grant Webb (UKY) |
12:35 | Voltage scan and comisc ray study ( 00:10 ) 0 files | Yuri Fisyak (BNL) |
12:45 | VMC geometry update ( 00:15 ) 1 file | Jason Webb (BNL) |
Run 11 preparation/support meeting #11
2-187, EVO, at 19:00 (GMT), duration : 01:00
Minutes
Attendees: Dmitry A., Wayne B., Leve H., Jérôme L., Gene V.B., Jan B., Maxim N.
Online webserver:
- Wayne installed: SSH key client, osiris
- ITD backups nominally working
- Warning message noted to ITD
- Restore from backup (targeting machine with warning messages) by Wayne went fine
- Synchronization with secondary new webserver done manually, not yet automatic
- Tomcat restarted for Leve's scheduler statistics redeployment
- Jerome's service idea should be copied from old server so that intentional downtimes don't trip the watcher
ShiftLog:
- Detector checklist being posted to the ShiftLog regularly now. A suggestion from Jan to make it easier was to make a shortcut to the checklist document from the ShiftLeader's desktop (post-meeting Leve learned that each new checklist is given a new document name; finding the document isn't that difficult anyhow)
Online user support:
- Enabling migration off sc.starp and onto linux pool
- Dmitry notes that Corba libs are present on linux pool
- Jan says Brian Page is in the midst of trying to move two components:
- cdev reader (needs Corba libs, will try compiling)
- DB writer (Gene will provide Brian with a working example code)
- Two online disks (noted by Wayne in previous meetings): replaced
- Some additional network issues this week
- Initially fixed with a switch replacement
- ...but addition failures occurred
- Wayne will try to fix during access later today (post-meeting update: this went successfully; topic closed)
- starutilities replacement: usage extension currently with "soft" limit
- Matt could not install C-AD software in XP mode on new box
- next possibility is dedicated XP machine (needs a software extension from ITD)
- Other tasks (noted in previous meetings) in queue
Databases:
- Shift sign-up un-subscriptions now exclude trainees [Dmitry]
- Feedback to slow controls to implement monitoring of their automated services (are they running? is data flowing? for now, only Dmitry's dbPlots and an environment monitor)
- TPC anode voltage migration was not running, but Dmitry will turn it on (we are aware of its shortcomings)
- Dmitry noted RHIC clock seems to be variable even during fills (Gene will follow up)
- Some discussion of when DB snapshots run; Jan seems OK with packaging at midnight for 4am usage for cloud jobs
- Dmitry notes that there's an open discussion about recorded prescales being wrong. If this is corrected later in the DB, it shouldn't be a problem for offline users as long as they access from the DB and not from the DAQ files. Needs to be sorted out by trigger group first.
FastOffline:
- Laser files were being skipped; fixed by Jerome
- Automatic drift velocity calibrations started (and working)
QA:
- OnlinePlots had 3 problems spurred by update of starnew to SL11a library
- Macros used (Load.C) needed modified loading order of libraries (fixed)
- Numerous histograms getting skipped, and others in unusual order in output PDFs (RT 2085)
- Temporary solution: run in starpro (SL10k) patched for TOF/VPD updates
- Code for inserting PDFs into database should use 'REPLACE INTO' instead of 'INSERT INTO', but we only have a binary of the current version
- Dmitry had source code from an old version which Gene edited to work as desired now (update: codes committed to CVS)
- Offline QA histogram filling needed to update code for TPC padrow determination
- AutoCombining of offline histograms stopped working (Elizabeth investigating) (update: fixed for a changed function interface in QA daemon)
- No references yet, but shift crew (Lanny) looking at data
- Call for setting references will go out when physics readiness is achieved (many problems in current data)
Run 11 preparation meeting #10
1-189, EVO, at 19:00 (GMT), duration : 01:00
Minutes
Attendees: Dmitry A., Wayne B., Leve H., Jérôme L., Gene V.B.
Online webserver:
- Went live on Monday
- No user complaints reported
- Jerome: identified and fixed several problems
- Jerome: started tomcat watcher script (he noted that it already caught a number of trips)
- Dmitry: though crontab had been copied, cron file entries had been missed
- Wayne: to-do list:
- Install SSH key client
- Install osiris
- Tomcat log file rotation
- ITD backups (noted as priority)
- Synchronization with secondary new webserver
ShiftLog:
- 'thumbnailer' problem, Jerome is helping
- MTD subsystem added by Leve
- Used the opportunity to move the array of subsystems to a single place
Online user support:
- sc.starp crashes (RT 2077)
- Was repeatedly shutting down
- Wayne: Box replaced (drive swapped), working fine...
- ...but suspicion is UPS, battery replaced
- Leave as is for now
- Dmitry suggested providing Corba libs on online linux pool nodes to allow users an alternative to sc (Gene spoke with Jan B. about this post-meeting: Brian P. will try migrating his code)
- Two online disks (noted by Wayne in previous meetings): still need replacement
- QtRoot Event Display (RT 2075): Jerome notes it is low priority for now
- HPSS access from online via data carousel: Jerome indicates work in progress for authentication (perhaps try replicating what was done with stargrid04)
- starutilities replacement: extension through Jan. 30th.
- trying to run C-AD software in XP mode on new box (Matt working with ITD today)
- next possibility is dedicated XP machine
- otherwise suggest software upgrade to C-AD (would require additional extension for procurement and delivery duration)
- rts01 replacement machine (RT 2070): machine has arrived
- Request for help restoring old FTPC event builders (RT 2066):
- Wayne attempted to resurrect old builders but didn't succeed
- Jeff has alternatives to move on
- Conference room PC display configuration improved by Wayne, but may still be not optimal
- FMS/Spin request for replacement machines (RT 2073 and 2074):
- Laptop provided, and had to learn from experience to prevent power saver settings from shutting it down
- Other machine not critical
- Renewed request today for a folder on ShiftLeader's terminal for making files web-available (RT 2083)
Databases:
- Migration macros started on onl01 by Dmitry
- Machine died (from exhausted memory): cause is unknown, but perhaps one of the users
- Table further investigation until it proves to be systematic
- Machine died (from exhausted memory): cause is unknown, but perhaps one of the users
- New DB node allocation for FatOffline(+nightly tests) awaits results of Lidia's tests
- Dmitry noted that detector ID lists needed to be (and were) verified among Lidia, Tonko, and himself
- Jerome noted that offline should rely on DB lookup by subsystem and not depend on a local list (he will follow up with Lidia)
- Dmitry implemented flexible, graphical selection of timestamps for his online DB plots upon feedback from Alexei L.
- Jerome tested on several OS-browser combinations
FastOffline:
- Jerome notes that the chain needs continued attention (e.g. geometry, CA tracker), and Gene will take this
QA:
- Some minor modifications made to OnlinePlots for VPD by Kefeng and Gene
Software and Computing Phone Meeting
1-189, EVO, at 17:00 (GMT), duration : 01:00
Time | Talk | Presenter |
---|---|---|
12:00 | FMS simulation production, status ( 00:15 ) 0 files | Jason Webb (BNL) |
12:15 | Production status & step forward ( 00:20 ) 0 files | Jerome Lauret (BNL) |
12:35 | AOB ( 00:20 ) 0 files | All (All) |
Run 11 preparation meeting #9
1-189, EVO, at 19:00 (GMT), duration : 01:00
Minutes
Attendees: Dmitry A., Wayne B., Leve H., Jérôme L., Gene V.B.
ShiftLog [Leve]:
- Checked that links work both ways with RunLog
- Re-deployed to fix an issue with a function used to tell if we're in a shift-taking period
- Dmitry suggested to modify this next year to not use dates
Online infrastructure [Wayne]:
- New webserver:
- Database access now working fine
- Watcher script needs turned on
- cron jobs (and associated files) need to be copied
- Announcement for a Monday swap:
- 2pm (Wayne, Jerome, Leve, Dmitry should be available to handle whatever may arise - Dmitry already knows one configuration he needs to change during the swap)
- NFS will be turned off on the old server, sync will be done, IPs swapped, and NFS on.
- Two online disks (noted last week) still need replacement
- Event Display
- Not currently functional (we lost our maintainer; Jerome is trying to prop it back up for now)
- Pieces in multiple locations for online display plus offline (images dumped for web display)
- Not critical (old online event display still functional), but valuable for PR
- Long-standing request (not critical) to be able to read from HPSS online (via data carousel)
- Jerome looking for any road blocks (e.g. firewall)
- Machine replacements:
- emcsc done
- starutilities and ftpcgas not yet done (scheduled for work with relevant parties early next week)
- All Windows nodes rebooted with security patches
- rts01 replacement request merged into a larger purchase request (RT 2070)
- Request for help restoring old FTPC event builders (RT 2066)
- Possible solution of other hardware if this is too much effort
- FMS/Spin request for replacement machines (RT 2073 and 2074). Not critical, but latter is important for remote access (current machine is inside WAH and not on network)
Databases [Dmitry]:
- After new nodes go into production pool, 2 old nodes will be targeted for FastOffline + nightly tests
- Assessment of new node speed/power still ongoing
- onldb1 and 2 were up for 1+ years, so rebooted on purpose to check state
- Collector daemons did not automatically start, but otherwise OK
- Dmitry suggested leaving it this way since reboot need is so rare and automatic can take a while to get caught up (has to look through and skip all runs from the year)
- Gene suggested making it automatic in case Dmitry's not around to restart daemons (though shift leaders can potentially do it themselves). We'll go with this.
- QA files (OnlinePlots + Jevp) stored for RunLog can now exceed 4 GB without needing manual rollover
- TPC Anode Voltges:
- For now, continue with methods of collection from last year
- Dmitry tested a supposedly-optimized "bulk reading" methods and found them to be no better than channel-by-channel (actually worse as it appears to be the same + overhead: 13-15 seconds to collect all values versus less than 1 second); dead end.
- Possible new methods of collection under development (Tonko & the TPC group)
- Feedback to the TPC group that wider "dead band zones" could reduce traffic and help alleviate problems due to load
FastOffline [Jerome]:
- Same as last week: many files coming into the system, but all are tests, pulsers, etc., so not being processed
- Gene notes some possible tweaks to the chain may be needed for TPC calibration status (awaiting TPC feedback)
- Dmitry need to verify with Lidia some exported info from from onldb1
QA [Gene]:
- Offline QA documentation updated (Lanny & Gene)
- Jerome notes the need to monitor and document and issues with QA Shift
- Pre-combining of FastOffline files still needs tested and configured to be the default
Software and Computing Phone Meeting
1-189, EVO, at 17:00 (GMT), duration : 01:00
Time | Talk | Presenter |
---|---|---|
12:00 | Simulation production overview ( 00:20 ) 0 files | Jason Webb |
12:20 | TPC Alignement revisit ( 00:10 ) 0 files | Yuri Fisyak (BNL) |
12:30 | AOB ( 00:20 ) 0 files | All (All) |
TPC Voltage issue and discussion #3
1-189, at 18:00 (GMT), duration : 01:00
Meetings are being set to discuss and converge as far as the TPC voltage issue is concerned.
This is the third of this kind.