Quality Assurance

Welcome to the Quality assurance and quality control pages.

Proposal and statements

.

Proposal for Run IV

Procedure proposal for production and QA in Year4 run

Jérôme LAURET & Lanny RAY, 2004

Summary: The qualitative increase in data volume for run 4 together with finite cpu capacity at RCF precludes the possibility for multiple reconstruction passes through the full raw data volume next year. This new computing situation together with recent experiences involving production runs which were not pre-certified prior to full scale production motivates a significant change in the data quality assurance (QA) effort in STAR. This note describes the motivation and proposed implementation plan.

Introduction

The projection for the next RHIC run (also called, Year4 run which will start by the end of 2003), indicates a factor of five increase in the number of collected events comparing to preceding runs. This will increase the required data production turn-around time by an order of magnitude, from months to one year per full-scale production run. The qualitative increase in the reconstruction demands combined with an increasingly aggressive physics analysis program will strain the available data processing resources and poses a severe challenge to STAR and the RHIC computing community for delivering STAR’s scientific results in a reasonable time scale. This situation will become more and more problematic as our Physics program evolves to include rare probes. This situation is not unexpected and was anticipated since before the inception of RCF. The STAR decadal plan (10 year projection of STAR activities and development) clearly describes the need for several upgrade phases, including a factor of 10 increase in data acquisition rate and analysis throughput by 2007.

Typically, 1.2 represents an ideal, minimal number of passes through the raw data in order to produce calibrated data summary tapes for physics analysis. However, it is noteworthy that in STAR we have typically processed the raw data an average of 3.5 times where, at each step, major improvements in the calibrations were made which enabled more accurate reconstruction, resulting in greater precision in the physics measurements. The Year 4 data sample in STAR will include the new ¾ barrel EMC data which makes it unlikely that sufficiently accurate calibrations and reconstruction can be achieved with only the ideal 1.2 number of passes as we foresee the need for additional calibration passes through the entire data in order to accumulate enough statistics to push the energy calibration to the high Pt limit.

While drastically diverging from the initial computing requirement plans ( 1), this mode of operation, in conjunction with the expanded production time table, calls for a strengthening of procedures for calibration, production and quality assurance.

The following table summarizes the expectations for ~ 70 Million events with a mix of central and minbias triggers. Numbers of files and data storage requirements are also included for guidance


Au+Au 200 (minbias)

35 M central

35 M minbias

Total

No DAQ100 (1 pass)

329 days

152 days

481 days

No DAQ100 (2 passes)

658 days

304 days

962 days

Assuming DAQ100 (1 pass)

246 days

115 days

361 days

Assuming DAQ100 (2 passes)

493 days

230 days

723 days

Total storage estimated (raw)

x

x

203 TB

Total storage estimated
(1 pass)

x

x

203 TB


Quality Assurance: Goals and proposed procedure for QA and productions

What is QA in STAR?

The goal of the QA activities in STAR is the validation of data and software, up to DST production. While QA testing can never be exhaustive, the intention is that data that pass the QA testing stage should be considered highly reliable for downstream physics analysis. In addition, QA testing should be performed soon after production of the data, so that errors and problems can be caught and fixed in a timely manner.

QA processes are run independently of the data taking and DST production. These processes contain the accumulated knowledge of the collaboration with respect to potential modes of failure of data taking and DST production, along with those physics distributions that are most sensitive to the health of the data and DST production software. The results probe the data in various ways:

  • At the most basic level, the questions asked are whether the data can be read and whether all the components expected in a given dataset are present. Failures at this level are often related to problems with computing hardware and software infrastructure.

  • At a more sophisticated level, distributions of physics-related quantities are examined, both as histograms and as scalar quantities extracted from the histograms and other distributions. These distributions are compared to those of previous runs that are known to be valid, and the stability of the results is monitored. If changes are observed, these must be understood in terms of changing running conditions or controlled changes in the software, otherwise an error flag should be raised (deviations are not always bad, of course, and can signal new physics: QA must be used with care in areas where there is a danger of biasing the physics results of STAR).

Varieties of QA in STAR

The focus of the QA activities until summer 2000 has been on Offline DST production for the DEV branch of the library. With the inception of data taking, the scope of QA has broadened considerably. There are in fact two different servers running autoQA processes:

  • Offline QA. This autoQA-generated web page accesses QA results for all the varieties of Offline DST production:

    • Real data production produced by the Fast Offline framework. This is used to catch gross errors in data taking, online trigger and calibration, allowing for correcting the situation before too much data is accumulated (this framework also provides on the fly calibration as the data is produced).

    • Nightly tests of real and Monte Carlo data (almost always using the DEV and NEW branches of the library). This is used principally for the validation of migration of library versions

    • Large scale production of real and Monte Carlo data (almost always using the PRO branch of the library). This is used to monitor the stability of DSTs for physics.

  • Online QA. This autoQA-generated web page accesses QA results for data in the Online event pool, both raw data and DST production that is run on the Online processors.

The QA dilemma

While a QA shift is usually organized during data taking, the later, official production runs were encouraged (but not mandated) to be regularly QA-ed. Typically, there has not been an organized QA effort for post-experiment DST production runs. The absence of organized quality assurance efforts following the experiment permitted several post-production problems to arise. These were eventually discovered at the (later) physics analysis stage, but the entire production run was wasted. Examples include the following:

  1. missing physics quantities in the DSTs (e.g. V0, Kinks, etc ...)

  2. missing detector information or collections of information due to pilot errors or code support

  3. improperly calibrated and unusable data

  4. ...

The net effect of such late discoveries is a drastic increase in the production cycle time, where entire production passes have to be repeated, which could have been prevented by a careful QA procedure.

Production cycles and QA procedure

To address this problem we propose the following production and QA procedure for each major production cycle.

  1. A data sample (e.g. from a selected trigger setup or detector configuration) of not more than 100k events (Au+Au) or 500k events (p+p) will be produced prior to the start of the production of the entire data sample.

  2. This data sample will remain available on disk for a period of two weeks or until all members of “a” QA team (as defined here) have approved the sample (whichever comes first).

  3. After the two week review period, the remainder of the sample is produced with no further delays, with or without the explicit approval of everyone in the QA team.

  4. Production schedules will be vigorously maintained. Missing quantities which are detected after the start of the production run do not necessarily warrant a repetition of the entire run.

  5. The above policy does not apply to special or unique data samples involving calibration or reconstruction studies nor would it apply to samples having no overlaps with other selections. Such unique data samples include, for example, those containing a special trigger, magnetic field setting, beam-line constraint (fill transition), etc., which no other samples have and which, by their nature, require multiple reconstruction passes and/or special attention.

In order to carry out timely and accurate Quality Assurance evaluations during the proposed two week period, we propose the formation of a permanent and QA team consisting of:

  1. One or two members per Physics Working group. This manpower will be under the responsibility of the PWG conveners. The aim of these individuals will be to rigorously check, via the autoQA system or analysis codes specific to the PWG, for the presence of the required physics quantities of interest to that PWG which are understood to be vital for the PWG’s Physics program and studies.

  2. One or more detector sub-system experts from each of the major detector sub-systems in STAR. The goal of these individuals will be to ensure the presence and sanity of the data specific to that detector sub-system.

  3. Within the understanding that the outcome of such procedure and QA team is a direct positive impact on the Physics capabilities of a PWG, we recommend that this QA service work be done without shift signups or shift credit as is presently being done for DAQ100 and ITTF testing.

Summary

Facing important challenges driven by the data amount and Physics needs, we proposed an organized procedure for QA and production relying on a cohesive feedback from the PWG and detector sub-system’s experts within time constraints guidelines. It is understood that the intent is clearly to bring the data readiness to the shortest possible turn around time while avoiding the need for later re-production causing waste of CPU cycles and human hours.


Summary list of STAR QA Provisions

Summary of the provisions of Quality Assurance and Quality Control for the STAR Experiment


Online QA (near real-time data from the event pool)
  • Plots of hardware/electronics performance
    • Histogram generation framework and browsing tools are provided
    • Shift crew assigned to analyze
    • Plots are archived and available via web
    • Data can be re-checked
    • Yearly re-assessment of plot contents during run preparation meetings and via pre-run email request by the QA coordinator
  • Visualization of data
    • Event Display (GUI running at the control room)
  • DB data validity checks

FastOffline QA (full reconstruction within hours of acquisition)
  • Histogram framework, browsing, reporting, and archiving tools are provided
    • QA shift crew assigned to analyze and report
    • Similar yearly re-assessment of plot contents as Online QA plots
  • Data and histograms on disk for ~2 weeks and then archived to HPSS
    • Available to anyone
    • Variety of macros provided for customized studies (some available from histogram browser, e.g. integrate over runs)
  • Archived reports always available
    • Report browser provided

Reconstruction Code QA
  • Standardized test suite of numerous reconstruction chains in DEV library performed nightly
    • Analyzed by S&C team
    • Browser provided
    • Results kept at migration to NEW library
  • Standardized histogram suite recorded at library tagging (2008+)
    • Analyzed by S&C team
    • Test suite grows with newly identified problem
    • Discussions of analysis and new issues at S&C meetings
  • Test productions before full productions (overlaps with Production QA below)
    • Provided for calibration and PWG experts to analyze (intended to be a requirement of the PWGs, see Production cycles and QA procedure under Proposal for Run IV)
    • Available to anyone for a scheduled 2 weeks prior to commencing production
    • Discussions of analysis and new issues at S&C meetings

Production QA
  • All aspects of FastOffline QA also provided for Production QA (same tools)
    • Data and histograms are archived together (i.e. iff data, then histograms)
    • Same yearly re-assessment of plot contents as FastOffline QA plots (same plots)
    • Formerly analyzed during runs by persons on QA shift crew (2000-2005)
    • No current assignment of shift crew to analyze (2006+)
  • Visualization of data
    • Event Display: GUI, CLI, and visualization engine provided
  • See "Test productions before full production" under Reconstruction Code QA above (overlaps with Production QA)
  • Resulting data from productions are on disk and archived
    • Available to anyone (i.e. PWGs should take interest in monitoring the results)

Embedding QA
  • Standardized test suite of plots of baseline gross features of data
    • Analyzed by Embedding team
  • Provision for PWG specific (custom) QA analysis (2008+)

 

Offline QA

Offline QA Shift Resources

STAR Offline QA Documentation (start here!)

Quick Links: Shift Requirements , Automated Browser Instructions , You do not have access to view this node , Online RunLog Browser

 

Automated Offline QA Browser

Quick Links: You do not have access to view this node

QA Shift Report Forms

Quick Links: Issue Browser / Editor , Report Archive

QA Technical, Reference, and Historical Information

 

Reconstruction Code QA

 As a minimal check on effects caused by any changes to reconstruction code, the following code and procedures are to be exercised:

  

  1. A suite of datasets has been selected which should serve as a reference basis for any changes. These datasets include:

    1. Real data from Run 7 AuAu at 200 GeV

    2. Simulated data using year 2007 geometry with AuAu at 200 GeV

    3. Real data from Run 8 pp at 200 GeV

    4. Simulated data using year 2008 geometry with pp at 200 GeV

     

  2. These datasets should be processed with BFC as follows to generate historgrams in a hist.root file:

    1. root4star -b -q -l 'bfc.C(100,"P2007b,ittf,pmdRaw,OSpaceZ2,OGridLeak3D","/star/rcf/test/daq/2007/113/8113044/st_physics_8113044_raw_1040042.daq")'

    2. root4star -b -q -l 'bfc.C(100, "trs,srs,ssd,fss,y2007,Idst,IAna,l0,tpcI,fcf,ftpc,Tree,logger,ITTF,Sti,SvtIt,SsdIt,genvtx,MakeEvent,IdTruth,geant,tags,bbcSim,tofsim,emcY2,EEfs,evout,GeantOut,big,fzin,MiniMcMk,-dstout,clearmem","/star/rcf/simu/rcf1296_02_100evts.fzd")'

    3. root4star -b -q -l 'bfc.C(1000,"pp2008a,ittf","/star/rcf/test/daq/2008/043/st_physics_9043046_raw_2030002.daq")'

    4. ?

     

  3. The RecoQA.C macro generates CINT files from the hist.root files

    1. root4star -b -q -l 'RecoQA.C("st_physics_8113044_raw_1040042.hist.root")'

    2. root4star -b -q -l 'RecoQA.C("rcf1296_02_100evts.hist.root")'

    3. root4star -b -q -l 'RecoQA.C("st_physics_9043046_raw_2030002.hist.root")'

    4. ?

     

  4. The CINT files are then useful for comparison to the previous reference, or storage as the new reference for a given code library. To view these plots, simply execute the CINT file with root:

    1. root -l st_physics_8113044_raw_1040042.hist_1.CC
      root -l st_physics_8113044_raw_1040042.hist_2.CC

    2. root -l rcf1296_02_100evts.hist_1.CC
      root -l rcf1296_02_100evts.hist_2.CC

    3. root -l st_physics_9043046_raw_2030002.hist_1.CC
      root -l st_physics_9043046_raw_2030002.hist_2.CC

    4. ?

     

  5. One can similarly execute the reference CINT files for visual comparison: 

    1. root -l $STAR/StRoot/qainfo/st_physics_8113044_raw_1040042.hist_1.CC
      root -l $STAR/StRoot/qainfo/st_physics_8113044_raw_1040042.hist_2.CC

    2. root -l $STAR/StRoot/qainfo/rcf1296_02_100evts.hist_1.CC
      root -l $STAR/StRoot/qainfo/rcf1296_02_100evts.hist_2.CC

    3. root -l $STAR/StRoot/qainfo/st_physics_9043046_raw_2030002.hist_1.CC
      root -l $STAR/StRoot/qainfo/st_physics_9043046_raw_2030002.hist_2.CC

    4. ?

     

  6. Steps 1-3 above should be followed immediately upon establishing a new code library. At that point, the CINT files should be placed in the appropriate CVS directory, checked in, and then checked out (migrated) into the newly established library: 

    cvs co StRoot/qainfo
    mv *.CC StRoot/qainfo
    cvs ci -m "Update for library SLXXX" StRoot/qainfo
    cvs tag SLXXX StRoot/info/*.CC
    cd $STAR
    cvs update StRoot/info
    

     

Missing information will be filled in soon. We may also consolidate some of these steps into a single script yet to come.

 

 

Run QA


Helpful links:

Run 19 (BES II) QA

Run 19 (BES 2) Quality Assurance

Run Periods

Detector Resources

BBC BTOF BEMC EPD
eTOF GMT iTPC/TPC HLT
MTD VPD ZDC  

Other Resources

QA Experts:
  • BBC - Akio Ogawa
  • BTOF - Zaochen Ye
  • BEMC - Raghav Kunnawalkam Elayavalli
  • EPD  - Rosi Reed
  • eTOF - Florian Seck
  • GMT - Dick Majka
  • iTPC- Irakli Chakaberia
  • HLT - Hongwei Ke
  • MTD  - Rongrong Ma
  • VPD  -  Daniel Brandenburg
  • ZDC - Miroslav Simko and Lukas Kramarik
  • Offline-QA - Lanny Ray  + this week's Offline-QA shift taker
  • LFSUPC conveners: David Tlusty, Chi Yang, and Wangmei Zha 
    • delegate: Ben Kimelman
  • BulkCorr conveners: SinIchi Esumi,  Jiangyong Jia, and Xiaofeng Luo 
    • delegate: Takafumi Niida (BulkCorr)
  • PWGC - Zhenyu Ye
  • TriggerBoard (and BES focus group) - Daniel Cebra
  • S&C - Gene van Buren

Meeting Schedule

  • Weekly on Thursdays at 2pm EST
  • Blue Jeans information:
    To join the Meeting:
    https://bluejeans.com/967856029
    
    To join via Room System:
    Video Conferencing System: bjn.vc -or-199.48.152.152
    Meeting ID : 967856029
    
    To join via phone :
    1)  Dial:
    	+1.408.740.7256 (United States)
    	+1.888.240.2560 (US Toll Free)
    	+1.408.317.9253 (Alternate number)
    	(see all numbers - http://bluejeans.com/numbers)
    2)  Enter Conference ID : 967856029
    

AuAu 19.6GeV (2019)

Run 19 (BES-2) Au+Au @ √sNN=19.6 GeV

PWG QA resources:

 Direct links to the relevant Run-19 QA meetings:

 

LFSUPC Run-by-run QA

 

Event Level QA

 

Track QA (no track cuts)

 

Track QA (with track cuts)

 

nHits QA (no track cuts)

 

nHits QA (with track cuts)

 

AuAu Fixed Target (2019)

 

Run 20 (BES II) QA