Software and Computing

TPC Voltage issue and discussion #2

2011-01-18 10:00
2011-01-18 11:00
America/New York
Tuesday, 18 January 2011
1-189, at 15:00 (GMT), duration : 01:00

This meeting is the second of this kind. Meetings are being set to discuss and converge as far as the TPC voltage issue is concerned.

Several action items and discussions were started since the first meeting (see especially this thread).

 

TPC Voltage issue and discussion #1

2011-01-14 16:30
2011-01-14 17:30
America/New York
Friday, 14 January 2011
1-189, at 21:30 (GMT), duration : 01:00

Meeting was set to discuss on was to converge with the TPC voltage issue. This is the first of a serie of meetings with the diverse experts.

Run 11 preparation meeting #8

2011-01-14 14:00
2011-01-14 15:00
America/New York
Friday, 14 January 2011
1-189, EVO, at 19:00 (GMT), duration : 01:00

Minutes 

Attendees: Wayne B., Leve H., Jérôme L., Gene V.B.

 

ShiftLog [Leve]:

  • New server:
    • Ready:
      • App copied to new server (Wayne)
        • Successfully passed stability tests (pounding on it)
      • All WARs deployed
      • Scripts ready to flush cache and re-copy WARs
    • To do:
      • Will re-check links to and from RunLog
      • Watcher script not on yet
      • Verify Offline QA Shift Report connections

 

Online infrastructure [Wayne]:

  • New web server:
    • Links on homepage checked
      • Some issues with DB access and privilege from current node name (dean2)
    • New jpgraph version installed in same location as the old one
    • Disk exports are not in place, so content isn't being updated currently
    • Will do another sync over the weekend or early next week, in addition to the final one when we swap servers
    • Ganglia to be installed
    • User accounts to be added
  • New web server backup:
    • Configuration kept the same as main new web server
    • ...but content isn't sync-ed
  • Password rotation completed
  • mysql-devel installed on online linux nodes (RT 2068)
  • Two online disk failures (on onl09 (non-critical) RT 2065 and on onlldap (critical) RT 2067)
    • RAID still providing for the file systems in the interim
  • ITD network backups failing (RT 2069), but resolution from ITD may have just come through
  • Machine replacements:
    • emcsc ready (needs either deployed by tomorrow or ITD block needs postponed)
    • emcspin ready (Monday)
    • starutilities awaiting C-AD software

 

FastOffline [Jerome]:

  • Many files coming into the system, but all are tests, pulsers, etc., so not being processed
  • Some missing detector setup, beam energy, etc. information: could mean misconfiguration, or potentially not properly propagated data

 

QA [Gene]:

  • OnlinePlots switched back to newer EVP server
    • Switching steps still undocumented
  • Pre-combining of FastOffline files still needs tested and configured to be the default

 

Run 11 preparation meeting #6

2010-12-17 14:00
2010-12-17 15:00
America/New York
Friday, 17 December 2010
1-189, EVO, at 19:00 (GMT), duration : 01:00

 Minutes

Attendees: Kefeng, Wayne B., Leve H., Gene V.B., Dmitry A.

 

ShiftLog [Leve]:

  • Manual changes have been reviewed (by Jerome) and committed
  • Re-deployed

 

Databases [Dmitry]:

  • cdev enabled (by C-AD), so parameter propagation has begun
    • Some data is still empty (e.g. beam species) and may not be filled for several weeks (when beams actually start)
      • Might be possible to enter some default values for now
    • Numbers are meaningless, so not useful for FastOffline testing
  • Testing new hardware nodes now (2 are available)
    • Configuration of new nodes in discussion

 

Online Infrastructure [Wayne]:

  • STAR login environment needs to be able to handle SL5.5 (which was installed on some nodes)
    • Will follow up with Jerome
  • Online linux pool is losing roughly 1 node a day (rebooting)
    • Hit a few with network scans, but response was OK, so no clues
  • Spin request (for Pibero):
    • User has access to the nodes and is waiting for directory structures and mount points
      • Expect to complete within a few days
  • Condor installation on for rterm-like access to online pool from gateway
    • Installed on the gateway
    • Components for the pool nodes is to be done
  • Slowness on evp
    • Only occurred yesterday (Dec. 16)
    • Nothing obvious, but suspicious of AFS, and coincided with mock data transfer challenge from counting house to HPSS
    • Paths forward discussed:
      • Remove outside AFS dependence using an online repository
      • Remove AFS dependence using local codes on evp
      • Write a new tool better identify AFS issues (i.e. more proactive than 'fs checkservers')
    • Problem doesn't seem to persist, not a priority for now (no action)
  • Yury G. requested networking support for the east FPD/FMS rack
    • Just extends the starp network geographically
    • Needs to be a fiber connection for proper grounding
  • Webserver replacement
    • Two redhat 5 machines given the ITD thumbs up
    • Shared filespaces in progress
    • No services started yet
    • Request for an account to test tomcat (Leve)

 

QA [Gene]:

  • Demo of the Offline QA with reference histogram comparison
    • Missing some features for flexibly defining reference histogram sets (will work on over the holidays)
    • Minor suggestion made for improving the "waiting..." display

Run 11 preparation meeting #7

2011-01-07 14:00
2011-01-07 15:00
America/New York
Friday, 7 January 2011
1-189, EVO, at 19:00 (GMT), duration : 01:00

 Minutes

Attendees: Leve H., Dmitry A., Jerome L., Wayne B., Gene V.B.

 

Databases [Dmitry]:

  • All DB collectors running now
    • Only TPC voltages absent due to no data yet
  • Slow Controls Archiver not yet running to avoid collecting value-less data
    • With Run start imminent, this needs to get going
    •  Dmitry will follow up with Yury G.
  • Potential to benefit from an unused EPICS feature to read multiple channels in one request (we have been reading one channel at a time across the board), which could cut down the overhead in obtaining data
    • Unknown why this wasn't previously used, so testing with caution to learn (perhaps the multiple channel data comes in a burst which could fail for some reason)
  • New shift sign-up release is imminent (possibly today)
    • Only minor features getting final adjustments (features have been [node:19902 "previously presented"])
  • RunLog working fine: new runs appearing, but all being marked bad
    • Jerome notes that bad runs won't come through FastOffline, which has been turned on already
  • New DB nodes
    • Tested and ready for use
    • New configuration (relevant for FastOffline use) not yet in place (Jerome and Dmitry will work this out)
  • Online DB plots working and logs show usage
    • Only unavailable quantities are those collected from the Slow Controls Archiver

 

ShiftLog [Leve]:

  • New manual is printed out and in place at the counting house
  • ShiftLeader desktop computer has all the necessary icons, correctly linked and tested, including making a ShiftLog entry
  • New web server not critical (stable operation on old web server)

 

Online infrastructure [Wayne & Jerome]:

  • OS upgrade
    • One critical: a replacement node for usage in monitoring chilled water (old machine is Windows 2000) is awaiting ITD processing.
      • Follow-up with ITD on Monday if nothing transpires by then
    • Other machines in the queue ("bond", "beatrice", "l3disp")
  • Online web server replacement
    • Currently replicating old machine on new one: file copying and version checking
      • ShiftLog request to continue with same Tomcat version (a known quantity) instead of upgrading in hopes of improved stability
    • Spin resource request involves storage to be delivered on new server
  • Uncertainty in status of UPS services for computers at the experiment
    • Testing today before the Run starts
  • Network request for east FMS/FPD racks completed
  • Online linux pool has been stable for the past ~3 weeks (after a couple weeks of apparently random reboots)

 

FastOffline [Jerome]:

  • Ready, running, and waiting for data

 

QA [Gene]:

  • OnlinePlots still running on older evp machine (but stable)
    • Will look into moving it back over to the new node
    • Need Jeff L. to flip some switches (and document it)
  • Offline QA
    • Finished implementing tools to update just specific histograms in reference set
    • QA Shift set of histograms will be flushed next week and started anew
      • Histograms only go into the set if given a description, and only stay in the set if a reference is set

 

We are expecting some collider operations imminently, so data will likely flow through the entire system over the next week.

 

TPC field cage short dates

I present this in the context of trying to understand what may play a role in a timeline study of the h-/h+ issue, which is believed to have begun sometime between 2004 and 2008 (between Run 4 and

Software and Computing Phone Meeting

2011-01-05 12:00
2011-01-05 13:00
America/New York
Wednesday, 5 January 2011
1-189, EVO, at 17:00 (GMT), duration : 01:00
TimeTalkPresenter
12:00FMS simulation, open request and readiness ( 00:20 ) 2 filesPibero Djawotho (TAMU)
12:20Production status ( 00:15 ) 0 filesLidia Didenko
12:35AOB ( 00:20 ) 0 filesAll (All)

h+/h- in Run 10 and beyond

2010-12-17 15:00
2010-12-17 16:00
America/New York
Friday, 17 December 2010
EVO, at 20:00 (GMT), duration : 01:00

Issue 2043

In discussion of RT ticket 2043 and applying a patch to the mapping at the MuDst level instead of re-producing the data...

Software and Computing phone meeting

2010-12-15 12:00
2010-12-15 13:00
America/New York
Wednesday, 15 December 2010
1-189, EVO, at 17:00 (GMT), duration : 01:00
TimeTalkPresenter
12:00Multi-site Data transfer project - status ( 00:20 ) 1 fileMichal Zerola (NPI / ASCR)
12:20Year 2011 geometry status and production intents for Y11 ( 00:15 ) 0 filesJason Webb (BNL)
12:35Production status and stats ( 00:15 ) 0 filesJerome Lauret (BNL)
12:50AOB ( 00:10 ) 0 filesAll (All)

Global tracks with negative flags

Many global tracks (approximately half) are given a negative flag. As I had not really understood this before, I made some brief investigations...

 

Run 11 preparation meeting #5

2010-12-10 14:00
2010-12-10 15:00
America/New York
Friday, 10 December 2010
1-189, at 19:00 (GMT), duration : 01:00

Attendees: Leve H., Dmitry A., Jérôme L., Wayne B., Gene V.B., Jeff L.

 

Minutes:

Databases [Dmitry]:

  • Conditions collectors started (all except beam information related ones)
    • Everything running appears to be OK and status can be checked from web
    • Beam info needs cdev, provided by C-AD once their operations get going for the Run (also includes magnetic field setting; expect data around the time we turn on the STAR magnet)
      • Absence of this data means RunLog does not finalize runs and information is not migrated to Offline, but this is expected to be OK for now for anyone doing online-only tests (requested to post a note to rts-hn about this)
    • No migration issues expected, but TPC anode voltage recording practices need to be ironed out in the context of the Run 11 data (discuss with Maxim)
  • Online DBs flushed
    • As expected, a few failed links and missing web pages were noted (by users too) and fixed
  • Once cdev is running, will be ready for turn on / testing of FastOffline

 

ShiftLog [Leve]:

  • As expected, excluded from and not affected by DB flush
  • Manual updated for instructions on changing run status (may receive further review & changes from Jerome)

 

Online nodes [Wayne]:

  • AFS has been stable on the newer evp machine since last Friday
    • Jeff's been using the machine (and AFS) without incident
    • Will likely switch OnlinePlots back to this machine (from evp2, the older machine) next week
      • Documentation requested for switching between these machines
  • Replacement of dean at the stage of OS installation on new hardware
    • Timeline for replacement should be well before Run starts, as disk space on the new machine has been requested for online use
  • Trigger commissioning is a high priority part of the request for online node resources (Wayne will contact Pibero directly)
  • Nothing to report on OS upgrades

 

Online QA - jevp demo [Jeff]:

  • Code
    • Subsystems have an xxxBuilder class (e.g. tpcBuilder, eemcBuilder, etc.) which inherits from a jevp base class for plots
    • Plot objects can have multiple histograms added to them, as well as other graphical objects like lines and circles (as long as they are of some "component" type?)
  • Configuration
    • Editor allows hierarchical arrangement of plots into tabs and sub-tabs, under larger Sets (e.g. Shift, 2009, ESMD)
      • Features drag & drop for re-arrangement
    • Tabs can have collective properties for all plots underneath (such as setting all to have the same maximum)
    • Plots can have individual properties (such as log y scale)
  • Running
    • One server starts up several builders as separate processes
    • Data is sent to the builders from a file or an event pool
    • Presenter runs separately to display plots
      • Similar basic look & feel to current OnlinePlots presenter
      • Clicking on a plot brings up a reference side-by-side
        • Drag & drop to add new reference, or select a different reference from panel of choices
        • Suggested: tag references with run from which they're made, and timestamp (already includes a comment)
        • Deleting references is simple, but not so simple as to be done accidentally
        • Mild concerns to be wary of reference bloat and organization as things evolve at the experiment (including collision species/energies change)
    • PDFs generated and uploaded to RunLog DB for each Set
      • Include table of contents
      • Concerns of space this will consume on the onldb server and redundancy of plots
        • Run 10 PDFs used more than half the available disk space, and it is reasonable to expect a doubling (or more) of the required space for Run 11
        • Wayne & Dmitry will assess the current storage on onldb [post-meeting statement that space may be fine; follow up next week]
        • Requirements still need to be more accurately defined
          • Will depend on what gets axed/kept from review by subsystems & trigger board
          • Would be helped by real data, but that's a bit late

Next week: demo of Offline QA for AutoQA with reference histograms [Gene]

MySQL trigger for oversubscription protection

How-to enable total oversubscription check for Shift Signup (mysql trigger) :
delimiter |

CREATE TRIGGER stop_oversubscription_handler BEFORE INSERT ON Shifts
FOR EACH ROW BEGIN

Software and Computing phone meeting

2010-12-08 12:00
2010-12-08 13:00
America/New York
Wednesday, 8 December 2010
1-189, EVO, at 17:00 (GMT), duration : 01:00
TimeTalkPresenter
12:00Summary of the h+/h- meeting ( 00:20 ) 0 filesGrant Webb (UKY)
12:20Production status ( 00:15 ) 0 filesLidia Didenko (BNL)
12:35AOB ( 00:20 ) 0 filesAll (All)

Issue 2040

 RT ticket 2040

 

Unbiased distributions of first hits on global tracks:

Review of Past Issues and Current Understanding


Talk time : 15:20, Duration : 00:20

h+/h- in Run 10 and beyond

2010-12-03 15:00
2010-12-03 16:00
America/New York
Friday, 3 December 2010
EVO, at 20:00 (GMT), duration : 01:00
TimeTalkPresenter
15:00Description of Problem ( 00:20 ) 0 files
15:20Review of Past Issues and Current Understanding ( 00:20 ) 1 file
15:40Plan of Action Development ( 00:20 ) 0 files
16:00Task Assignments ( 00:20 ) 0 files
16:20AOB ( 00:20 ) 0 files

Run 11 preparation meeting #4

2010-12-03 14:00
2010-12-03 15:00
America/New York
Friday, 3 December 2010
1-189, at 19:00 (GMT), duration : 01:00

Minutes:

Attendees:Dmitry A., Leve H., Matt A., Wayne B., Jeff L., Jérôme L., G. Van Buren

Databases [Dmitry]:

  • Online backup now using the new NAS system
    • Daily backup of all three ports with a retention time of 7 days
    • Currently have 2+ TB of space, which is probably more than enough for even 14 days retention
    • Potential problem with permissions due to NFS mount and different user IDs on different systems, but not a problem presently
    • Email alerts of problems from NAS goes to Wayne & Dmitry
  • Flush of online DBs not yet done
    • Reasoning is that still in testing at STAR and this can add up to significant amount of data
    • ...but we're not sure which test data people will want to keep associated to Run 11
    • Decision made to go forward with the flush and not continue waiting for testing to get further along
    • NB: ShiftLog (and some other) DBs and tables are skipped in the flush; ShiftLog is already recording for Run 11
  • Shift Signup GUI has been re-written
    • Demo shown
    • Some details of new features still need implementation (working with Jérôme)
    • All old features are in place; could replace old codes at any time (pending bug checks)
    • Deployment schedule not fixed by any deadlines
  • Isolated nodes for FastOffline in Run 11
    • Not in place, but Dmitry will write up a config file for this

 

ShiftLog [Leve]:

  • Nothing until new web server
    • No progress on new webserver

 

Online nodes [Wayne & Matt]:

  • OS upgrades:
    • FTPC and PMDsc done/replaced
    • To be done: Bond, EMCsc (coordinate with users), STARUtilities (coordinate with C-AD), EMC01, Beatrice, L3display (Jeff notes the need for QT4 on this node for display programs)
  • FUSE now working on all linux pool nodes
  • gcc standardized on all linux pool nodes
  • Recent rise in instability of the linux pool nodes: several have halted and/or needed rebooted in the past two weeks
    • Previous solution of disabling USB controller not helpful for this (that solution is still in effect)
    • No obvious environmental changes, but seems likely given the pattern
    • These nodes are ~5.5 years old (hard to believe they would show age problems within a couple weeks of each other)
    • Similar nodes are in use for offline DBs and not showing problems (located in BCF)
  • Newer EVP machine experiencing AFS issues
    • Access given to John McCarthy to help diagnose
    • OnlinePlots will be switched to use old EVP machine for the time being (Gene & Jeff will arrange)

 

Software and Computing phone meeting

2010-12-01 12:00
2010-12-01 13:00
America/New York
Wednesday, 1 December 2010
1-189, EVO, at 17:00 (GMT), duration : 01:00
TimeTalkPresenter
12:00Review of ticket 2036 (embedding timestamp) ( 00:20 ) 0 filesAll (All)
12:20Data production issues and projections ( 00:20 ) 0 filesLidia Didenko (BNL)
12:40AOB ( 00:20 ) 0 filesAll (All)

Run 11 preparation meeting #3

2010-11-19 14:00
2010-11-19 15:00
America/New York
Friday, 19 November 2010
1-189, at 19:00 (GMT), duration : 01:00

Attendees: Wayne B., Leve H., Gene V.B., Matt A.

Leve:

  • ShiftLog exercised (successfully tried loading large 4+MB images)
  • Awaiting new online web server
  • No current action items regarding changing shift status after lat week's meeting

Matt:

  • Windows 2000 upgrades to XP; involves replacing some computers:
    • ftpctemp, emcsc, pmdsc replacements exist, but need configured before switch
    • Timescale: mid-December
  • OS upgrades on emc01, beatrice, l3display
    • l3display has GUI issues (independent of the old large display screen issue brought up a couple weeks ago by Wayne); hope that the OS update to SL53 will resolve the GUI problems
    • Timescale: next week (make sure systems are running again before the long weekend next week, perhaps time beatrice update with any plans by EMC people to be away)
    • It would be of interest to know whether the l3display can support the STAR environment (e.g. AGS, gcc, etc.)

Wayne:

  • Online linux pool gcc issue: an artifact of a known problem not corrected in Matt's installation script.
    • A few nodes already fixed, pne node known to need fixing, and a few more need checked (low priority)
    • FUSE also stopped working on a few nodes during some of these re-installs (due to some package dependencies)
  • Ganglia metric to list number of users logged in now turned on for STAR gateways; will also add metric to the online linux pool machines

Gene:

  • OnlinePlots running stably on newer EVP server
  • FMS group expressed interest in changing some OnlinePlots; they were directed to the codes
  • Contacts for components of online QA [node:19794 "posted"]

AOB:

  • Next meeting in two weeks (holiday next week)