- General information
- Data readiness
- Calibration
- Databases
- Quality Assurance
- Online QA
- Proposal and statements
- Offline QA
- QA Shift Report Instructions
- STAR QA Documentation
- Configuring AutoQA for subsystems
- Fast Offline QA Histogram References and Descriptions
- Fast Offline QA Shift Report Preparation and Instructions for Run 8
- Fast Offline QA Shift Report Preparation and Instructions for Run 9
- Information for Fast Offline QA Shifts - Run 8
- Information for QA Shifts
- Integrating QA Histograms into Makers
- Manual generation of QA histograms
- Offline QA Histogram Trigger Types
- Offline QA Shifts (Y2000 run)
- Other Expert contacts and links
- QuickStart Instructions for the Auto QA Browser - Run 8
- QuickStart Instructions for the Auto QA Browser - Run 9
- STAR QA Documentation
- STAR QA for Offline Software
- STAR QA links and contacts
- Summary of Fast Offline QA Shift Duties - Run 8
- Summary of Fast Offline QA Shift Duties - Run 9
- Technical Documentation of the Auto-Combine Function
- Technical Documentation of the Offline QA Browser
- Technical Documentation of the Offline QA Shift Reports
- Reconstruction Code QA
- Run QA
- Grid and Cloud
- Infrastructure
- Machine Learning
- Offline Software
- Production
- S&C internal group meetings
- Test tree
Offline QA Shifts (Y2000 run)
Updated on Tue, 2008-02-19 20:30. Originally created by genevb on 2008-02-19 20:30.
Under:
Peter Jacobs, July 11, 2000
This document is a first try at describing procedures for the Offline QA shift crew. As you will see, there are a number of open questions concerning what should be done during this shift and how to do it, whose answers we will have only after we gain experience with real data. Please give feedback to the STAR QA links and contacts on what you find confusing, what could be done better, and what doesn't make any sense to you.- Scope of the Offline QA shift activities
The proposed scope of the Offline QA shift is to assess the quality of
the DSTs being produced by the Offline Production team. There are
several classes of data to be examined:
- Large scale production of real data: data that will be used for physics analysis
- Large scale production of MC data: MC data that will be used for detailed physics studies and corrections for data analysis.
- Nightly tests of real and MC data: limited number of events run in the DEV or NEW branches of the library. These are used to test the libraries and validate them prior to a new r elease and migration DEV->NEW->PRO.
- Express queue of real data: a small fraction (~5%?) of real data will be channeled to an express production queue during the running of the experiment, to serve as feedback to the crews running the experiment. The results of this production should be reported as soon as possible, typically at the 5 p.m. meeting in the counting house.
- Use of autoQA
The principal tool for the Offline QA shift crew is the autoQA web
page. Discussion of QA in general and detailed usage of that page
can be found STAR QA for Offline Software, which you should be
familar with before you read the rest of this document. Usage of
autoQA version 2 is very similar to the old autoQA (version 1), so if
you used that you should be able to understand the following.
There have however been many changes behind the scenes. The major changes are
- autoQA now interfaces to the MySQL databases. It queries the Production File Catalog for completed jobs, and writes QA information back to a QA database. The latter can be used in future in the tag DB or some other mechanism, once a reliable QA cycle is established.
- autoQA can now handle the range of data classes specified in the introduction.
- All QA ROOT jobs are now run on rcas under LSF. This change was necessary in anticipation of a large volume of QA processes once large scale data taking starts. This of course also introduces another layer of complexity into the QA framework, and monitoring of autoQA jobs on rcas will be part of the QA shift work.
- Offline QA Shift Tasks
- Which runs to examine?
Discuss the recent production with the Production Crew and establish a
prioritized list of runs to QA. The express queue mechanism is still
under discussion and is not set up yet, but once it is established it
should recieve highest priority for timely feedback to the counting
house. The other criteria for setting priorities is whether urgent
feedback is needed for a library release, or other runs require
special attention. Otherwise, the shift crew should look at the most
recent production that has been QA-ed under the various classes of
data.
Since the autoQA mechanism queries the File Catalog once an hour (for real data, less frequently for other data classes) and submits QA batch jobs on rcas, there may be a significant delay between when production is run and when the QA results become available. We will have to monitor this process and adjust the procedures as necessary. Feedback on this point from the shift crew is essential.
- How to look at a run
I will specify how to look at a run in the data class "Real Data
Production". Other data classes will have different selection
procedures, reflecting the differences in the File Catalog structure
for these different classes, but the changes should be obvious.
- Select "Real Data Production" from the pulldown menu in the banner.
- Use the pulldown menus to compose a DB query that includes the run you are interested in. The simplest procedure at the moment is to specify the runID and leave all other fields at "any". In the near future these selections will include trigger, calibration and geometry information. Note that the default for "QA status" is "done".
- Press "Display Datasets". A listing of all catalogued runs corresponding to you query will appear in the upper frame.
- To examine the QA histograms, press the "QA details" button. In the lower panel, a set of links to the histogram files will appear. The format is gzipped postscript. If your browser is set up to launch ghostview for files of type "ps", these files will be automatically unzipped and displayed. Otherwise, you will have to do something more complicated, such as save the file and view it another way. Note that if the macro "bfcread_hist_to_ps" is reported to have crashed, some or all histograms may be missing.
- To examine the QA scalars and tests, scroll past the histogram links in the lower panel and push the button. Tables of scalars for all the data branches will appear in the auxilliary window.
- To commpare the QA scalars to similar runs, press the "Compare reports" button. Details on how to procede are found in the autoQA documentation. Note that until more refined selections are available for real data (e.g. comparing runs with idenitical trigger conditions and processing chains), this facility will be of limited utility. Note also that the planned functionality of automatically comparing to a standard reference run has not yet been implemented, for similar reasons.
- What QA data to examine
This area needs significant discussion. What we are generally looking
for is that all data are present and can be read (scalar values should
appear in all branches) and that the results look physically
meaningful (e.g. vertex distribution histograms). Comparison to
previous, similar runs to check for stability is highly desirable but
it is not clear how to carry this out at present, for reasons
described above. We should revisit this question as we gain more
experience.
The principal QA tool is the histograms, generated by bfcread_hist_to_ps. The number of QA histograms has grown enormously over the past six months and needs to be pruned back to be useful to the non-expert. This work is going on now (week of July 10) and more information will be forthcoming.
Description of all the macros run by autoQA is found here. This documentation is important for understanding the meaning of the QA scalars.
Here are some general guidelines on what to report:
- Status of run - completed, if not give error status (segmentation violation etc)
- Macros that crashed
- Macros whose QA status is not "O.K." (At present, this means simply that there is no data in the branch that macro is trying to read. No additional tests are applied to the data.)
- Anomalous histograms and scalars - this is necessarily vague at this point.
- How to report results
Once per shift you should send a status report to the QA
hypernews forum:
starqa-hn@www.star.bnl.gov
If you are doing Offline QA shifts, you should subscribe to this forum.
The autoQA framework has a "comment" facility that allows the user to annotate particular runs or to enter a "global comment" that will appear chronologically in the listing of all runs. These are displayed together with the datasets, and while not appropriate for lengthy reports, can serve as flags for specific problems and supply hyperlinks to longer reports. Note that this is not a high security system (anyone can alter or delete you messages).
You do not need the QA Expert's password to use this facility. Press the button "Add or edit comments" in the upper right part of the upper panel. You will be asked for some identifying string that will be attached to your comments. Enter you name and press return. You will have to press "Display Datasets" again, at which point a button "Add global comment" will appear below the pulldown menus, and each run listing will have an "Add comment" button. Follow the instructions. Messages are interpreted as html, so links to other pages can be introduced. One possibility is to enter the hyperlink to the QA report you have sent to starqa-hn. This can obviously be automated, but it isn't yet and doing it by hand should be straightforward.
- Checking QA jobs on rcas Every two hours you should check the status of autoQA jobs running on rcas, by clicking on "RCAS/LSF monitor" (upper right, under the "Add or Edit Comments" button). You cannot alter jobs using this browser unless you have the Expert's password, so there is no possibility of doing damage. Select jobs called QA_TEST. Each of these is a set of QA macros for a single run, that should require up to 10 minutes CPU time. The throughput of this system for QA is as yet unknown, but you should check that jobs are not sitting in the PENDING queue for more than an hour or two, and are not stalling while running (should not take more than 15 minutes CPU). In case of problems, contact an expert.
Peter Jacobs Last modified: Tue Jul 11 02:35:05 EDT 2000 - Which runs to examine?
Discuss the recent production with the Production Crew and establish a
prioritized list of runs to QA. The express queue mechanism is still
under discussion and is not set up yet, but once it is established it
should recieve highest priority for timely feedback to the counting
house. The other criteria for setting priorities is whether urgent
feedback is needed for a library release, or other runs require
special attention. Otherwise, the shift crew should look at the most
recent production that has been QA-ed under the various classes of
data.
»
- Printer-friendly version
- Login or register to post comments