STAR QA for Offline Software

Peter Jacobs, LBNL
July 7, 2000

Index

Introduction
Database Content and Operations
1. What is the QA database?
2. QA database updates
Starting Display and Data Selection
Viewing Run Information
Run Details
QA Details
Files and Reports
Compare Similar Runs
Automated QA and Automated Tests
Current scalars and tests
Expert's page
PERL, Object Oriented PERL, and CGI.pm

QA processes are run independently of the data taking and DST production. These processes contain the accumulated knowledge of the collaboration of modes of failure of data taking and DST production, along with those physics distributions that are most sensitive to the health of the data and DST production software. The results probe the data in various ways:

At the most basic level, the questions asked are whether the data can be read and whether the all the components expected in a given dataset are present. Failures at this level are often related to problems with computing hardware and software infrastructure.
At a more sophisticated level, distributions of physics-related quantities are examined, both as histograms and as scalar quantities extracted from the histograms and other distributions. These distributions are compared to those of previous runs that are known to be valid, and the stability of the results is monitored. If changes are observed, these must be understood in terms of changing running conditions or controlled changes in the software, otherwise an error flag should be raised. (Deviations are not always bad, of course, and can signal new physics: QA must be used with care in areas where there is a danger of biasing the physics results of STAR.)

STAR will produce hundreds of terabytes of data each year. Meaningful testing of the DSTs produced from these data is a daunting task, entailing an enormous amount of tedious repetition. This process must be automated to a very high degree, for reasons both of reliability and finite capacity of even the most dedicated grad student to do boring but important things. The web pages you are looking at are part of an automated framework for QA and testing, called autoQA.

Varieties of QA in STAR

Offline QA. This web page accesses QA results for all the varieties of Offline DST production:
- Nightly tests of real and Monte Carlo data (almost always using the DEV and NEW branches of the library). This is used principally for the validation of migration of library versions
- Large scale production of real and Monte Carlo data (almost always using the PRO branch of the library). This is used to monitor the stability of DSTs for physics.
Online QA (old). This web page accesses QA for data in the Online event pool, both raw data and DST production that is run on the Online processors.

Overview of autoQA framework

CGI scripts written in PERL

Data Catalogue: Maintain a database of all production and test datasets of real and MC data, together with a performance summary of each run: what was run, completed successfully or not, run time errors and diagnostics, memory and cpu usage, etc. New runs are added to the database by querying the Files Catalog in Offline and the Event Pool summaries in Online. The update queries occur automatically via cron jobs, with a frequency that is dependent upon the kind of data in question: they will be very frequent for Online data (say, every 10 minutes), less so for the nightly tests of MC data (say, once a day). These are parameters that will be adjusted as we gain expeience with the system and how it is used.
Automated running of QA macros: Run a set of QA ROOT macros on the dataset, report the results, and catalogue them in the database. The QA macros generate postscript files of histograms or ascii files containing scalars relevant to QA, both event-wise and run-wise scalars. The specific macros to be run may depend upon the nature of the data being analysed (real vs simulated, cosmic, calibration, etc.)
Examination QA macro output: The autoQA-generated web page facilitates access to the histograms and scalars resulting from running the QA macros. In addition, the comparison of different runs is possible by building comparison tables of scalars, which for instance allow the user to track physics-related qunatities (number of primary tracks, V0s, etc) of similar runs as a function of time.
Automated QA Evaluation: Following the running of the QA macros, the autoQA system can run a set of defined tests on the scalars generated by the QA macros, highlight scalars that fall outside of expected ranges ("cuts") by raising error and warning flags, and record the results in the QA database. The tests that are applied can depend upon the nature of the data being analysed, and the specific cuts of the tests can vary as needed. Which tests and what cuts to apply to a given data set are quite complex questions. If done well and kept current with the data being examined, this facility can serve as a reliable automated mechanism to validate the data, which is the ultimate goal of autoQA. If not done well, this facility can be misleading and not catch critical errors. Thus, for the time being (summer 2000), no automated tests will be applied to data generated in large scale production of real data. Once we gain experience with the system and understand how to test for stability, we will (slowly) introduce automated tests. Until that time, QA decisions will have to be made entirely by humans (that means you) looking at histograms and the time development of scalar quantities.

The autoQA-generated web pages present these data in a heirarchical fashion, with the most important information most prominently displayed. Go to the upper frame window and choose a data class from the pulldown menu in the banner. The resulting selection menus will depend upon the data class chosen, and correpsond closely with the Offline File Catalog that is maintained by the production team. Using the pulldown menus, choose an interesting subset of all catalogued data and press the "Display Datasets" button. The datasets available in the QA database satisfying the selection are listed in reverse chronological order, with details about the run submission, status of the data on disk, and a very brief performance summary given in the first three columns. The "QA Status" and buttons on the right hand side are described below.

The scalars and histograms are generated by ROOT macros running in the standard STAR framework. Description of the QA macros run in Offline can be found here. (July 8,2000: Online macros still to be defined.) The developers responsible for the macros can be found on the STAR QA links and contacts. The autoQA cron jobs automatically run thes macros and submit them as batch jobs to RCAS under LSF for Offline, and as daughter processes on the Online cluster for Online.

The framework has been written so that the addition of new macros is straightforward. No changes to the CGI scripts are needed to introduce new macros which produce postscript files. A single PERL subroutine needs to be added for a new macro which generates an ascii file of scalars which parses the file, extracts the QA scalars and puts them into some defined PERL structures.

Two kinds of QA scalars are defined: run-based and event-based. The run-based scalars characterize the run as a whole (for instance, the mean, rms, minimum and maximum number of tracks per event in the run). The event-based scalars characterize each individual event (the number of TPC hits in each event, is such-and-such a table is present in this event, etc.) As has been pointed out by a number of people, the "scalars" may also be the result of statistical tests (such as a chisquared or Kolmogorov test) comparing a histogram from the selected run to a reference histogram.

In addition to running QA ROOT macros to generate the scalars and histograms, the Offline Software QA framework can apply Boolean tests to an arbitrarily large set of scalars generated by these macros. (This is defined above as Automated QA Evaluation.) These tests will be of greatest use in probing the self-consistency of large scale production, but can also be used to detect changing conditions in the nightly and weekly test results. The results of all tests for each QA macro applied to a given run are summarized in the run listing table under "QA Status" . Details about the scalars and tests can be displayed via the "QA details" button (explained further below). are especially welcome.

The time dependence of QA scalars can be viewed via the "Compare similar runs" button. The question of what data to compare meaningfully is non-trivial for real data, especially if multiple triggers are present in a single run (for Monte Carlo data the comparisons are more straightforward). This facility will undergo changes as we gain experience. An important future extension of this will be to develop ROOT macros to superimpose histograms from a reference run on the QA histograms for each selected run.

Functionality that modifies the QA database entries (performing updates, running the QA macros, etc.) is hidden in a password-protected Expert's page.

Database Content and Operations

What is the QA database?

logfile_report.obj: summary of the run, generated by parsing the log file when the run is entered into the database. It is present for all runs in the database. Format is internal to PERL, not human-readable.
StError.txt, StWarning.txt: ascii files containing all strings written to log file by StMessageMaker within bfc. These are currently filled with many messages besides important errors and warnings and a general cleanup of user code should occur for this facility to be more useful in flagging real errors.
files of type .qa_report: Ascii file generated by each QA macro that produces acsii output of QA information. Filename is name of the macro.
files of type .ps.gz: Gzipped versions of postscript files generated by QA macros such as bfcread_hist_to_ps. Filename is name of macro. Links to these files are presented to the browser when QA Details are viewed.
files of type .evaluation: The result of Automated QA Evaluation applied to the qa_report output of one QA macro. Format is internal to PERL, not human-readable.

QA database updates

Expert's page

Starting Display and Data Selection

The Expert's Page button generates a request for the expert's pw, which in turn will display numerous buttons that control and change the DB, launch batch jobs, etc.

The button labelled "Add or Edit Comments" generates a request for your name. It will enable buttons associated with individual datasets, along with a button labelled "Add global comment". You will be able to enter comments that will be interspersed with the dataset listings. The global comments will have a specific date and time and will appear in chronological order with the datasets. These allow the user to indicate library changes, specific problems associated with a given dataset, etc. Usage of the Comment feature is quite simple and (hopefully) self-evident.

RCAS/LSF monitor: this is a link to a general utility monitoring all LSF activity on RCAS. It is a PERL/CGI wrapper around LSF commands such as "bsub" and "bpeek". Only expert users will only be able to manipulate LSF jobs,and then only jobs submitted by user "starlib".

Viewing Run Information

This display can be refreshed at any time by pushing the "Display Datasets" button.

Data Set

Created/On disk?

Run status

QA status

Automated QA Evaluation

If QA for this run has been initiated but not completed (from the Expert's page), blue text will indicate that a QA batch job is in progress.For Offline, a link will be given to "peek" at the batch job in progress. If the batch job has completed, a link will be given to the log file from the batch run (useful for diagnosing macro crashes).

Run Details

QA Details

The generated scalars and results from the Automated QA Evaluation can be displayed in a separate browser window (by pushing a button with an obvious label, situated below the listing of the ps files). There is a large volume of information to present here, of varying general interest and importance. The run-based scalars tend to be more important for physics QA than the event-based scalars, and so are highlighted in the order of display. QA tests that fail are likewise highlighted over those that succeed. The display is divided into several sections:

Run-based scalars, errors and warnings: Run based scalars (see overview) are presented for each QA macro for which they are defined, together with the automated tests that failed and generated an error or warning. The tests are labelled by short strings and are defined in more detail farther down the display. See current scalars and tests.
Event-based errors and warnings: Same as previous section, but for event-based scalar tests that generated errors and warnings. The actual scalar values are not given here. Event-based scalars are tabulated for each event and there may be many of them in total. Their values can be found in the tables of all QA tests applied, farther down the display.
Run-based tests (all entries): displays all QA tests applied to run-based scalars for each macro. Display shows each boolean test string, severity if failed (error or warning), and result (TRUE or FALSE). Failed tests are highlighted in red.
Event-based tests (all entries): displays all QA tests applied to event-based scalars for each macro. Display shows each boolean test string, severity if failed (error or warning), and result (TRUE or FALSE). Failed tests are highlighted in red.
Files and Reports
The table shows all files in the given production directory, together with their size and date of creation.
The remaining sections display:
1. Logfile: a link is given to the log file for the run. Check the size of the logfile before opening it: the largest log files can exhaust the virtual memory of your PC.
2. StWarning and StError files: ascii files containing all instances of StWarn and St Error in the log file.
3. Control and Macro Definition files: links to the specific control and macro definition files used for QA. Physical location of the files is given for reference, but clicking on the link will open the file. These files define the QA macros to be run, the run and event based scalars to extract, and the automated QA tests and specific cuts to apply. Each run has one control file and one or more macro definition files, which may be valid only for a specific event type (central collisions, cosmics, etc.), time period, or sequence of library versions
4. Postscript files: links are given to all postscipt files generated by the QA macros. These are the same as the links given under "QA histograms" on the "QA details" page.
5. Other files: shown only on the Expert's page. All files (other than of type .ps.gz) that are in the QA Database subdirectory for this run are displayed. Links are provided to ascii files (this excludes files of type .evaluation).
Compare Similar Runs
This display is proving to be fruitful for QA, but see the warning in the Overview section concerning the difficulty of defining which runs to compare meaningfully for real data (as opposed to Monte Carlo data). The run-based scalars of the current run are presented in a table with those of other runs, to investigate their time dependence.
The user is first given the option of comparing to multiple similar runs, or comparing to a predefined reference. The latter capability is not yet implemented, however, and the user will be redirected to the former. For nightly MC, "similar" currently means the same TPC simulator (tfs, tss or trs) and geometry (year_1b or year_2a). For real data, the selection criteria are not yet established (July 8, 2000).
After clicking on "Compare to Multiple Reports", the display in the lower frame shows all catalogued runs that are "similar" to the current run (which is listed first and labelled as "this run"), with check boxes to select the comparison runs. Multiple comparison runs are permitted, and judicious selection can give a useful display of the time dependence of the scalars. After selecting the comparison runs, push "do run comparison".
All comparisons runs are listed in a table and are assigned an arbitrary letter label for convenience. The remaining tables show a comparison of run-based scalars for each of the QA macros that was applied to the selected run, first the difference in value of each scalar relative to that for the selected run, and then the absolute values themselves (these tables obviously display the same information). Comparison runs with no valid entries for a given scalar value (macro wasn't run or it crashed) do not appear. If only some entries are valid, the remainder are given as "undef".
For convenience when writing the QA summary reports, the tables in this display are also written to an ASCII file, whose name is given near the top of the display.
In the near future a histogram comparison facility will be developed, automatically plotting the QA histograms for the selected run and one reference run on the same panel.
Automated QA and Automated Tests
In this documentation, I have tried to use the following definitions consistently (see also overview):
- Automated QA: Running a set of QA root macros on the production files.
- Automated Testing: Apply a set of cuts to the QA scalars generated by these macros.
While these are separate, sequential operations in principle, in practice in the current autoQA framework they are specified together. After a discussion of automated tests, I will discuss the steering mechanism and how both the QA and testing are specified for a given macro.
The appropriate set of tests to apply, and in the case of numerical comparisons, the actual values to compare to, are often dependent upon the specific class of event under consideration. Simulated events from event generators require different tests than cosmic ray events, and there will of course be many different classes of real data with different properties. The selection of the appropriate set of QA macros and tests to apply to a given run is done by means of a "Control File", which specifies "Macro Definition files", one for each QA macro to be applied. Each event class has a Control File. The detailed format and syntax of these files is discussed below.
Current scalars and tests
The current (Sept 99) scalars and Automated QA tests that are applied are for each macr o are:
- doEvents:
  - Run-based: Run-based scalars are quantities such as tracks_mean, tracks_rms, tracks_min and tracks_max, which are the mean and rms of the number of tracks per event, and the minimum and maximum per event over the run. See a Macro Definition file for doEvents for the full list. There is one test on these scalars, called Run stats in range, which checks each scalar against expected values.
  - Event-based: No event-based scalars are defined. However, an event-based test called Compare to Logfile, that is particular to doEvents, is defined. This test checks the strings written by the message manager in doEvents ("qa" = "on") against those written in the same event to the logfile during production. The test is that the number of tracks, vertices, TPC hits, etc., reported during production are the same as those read from the DST by doEvents, verifying the integrity of the DST and the reading mechanism. You will be pleased, but not surprised, to learn that no errors of this kind have been detected, but we will continue to check.
- QA_bfcread_dst_tables: QA_bfcread_dst_tables is Kathy's standard QA macro, which reports the number of rows for each table in the first event of the run. These scalars are defined as run-based for the QA tests. Tests on these scalars are:
  - Table exists: checks that all tables that are officially expected for a given class of event (year_1b vs. year_2a) are present in the first event
  - Row in range: similar to test "Run stats in range" in doEvents, and highlights same problems. Checks that the number of rows for each table in the first event of the run is within some window or equal to the number of rows of another table.
  - Unexpected Tables: checks that there are no tables in the first event that are not officially expected.
Expert's page
The Expert's page is a password protected page, containing the standard display plus functions that affect the content of the database. I will not detail all the functionality in the Expert Page here. If you are expert enough to want the Expert's password, you will have contacted one of the developers and found out how to determine the functionality of the various buttons yourself.
PERL, Object Oriented PERL, and CGI.pm
"CGI" stands for "Common Gateway Interface", and refers to the standard internet protocol for dynamically generating web pages by running "CGI scripts" which respond to user actions. When you purchase a book from Amazon.com, the web server is running a CGI script that responds to your search and purchase requests and sends your purchase details to literally hundreds of web-based merchants, who then target their advertising banners straight at you. CGI scripts can be written in various languages, but PERL is a well established industry standard for writing CGI scripts, is freely distributed, has extensive documentation, support and software available over the web and from the standard sources, and appears to be the right choice for writing the QA CGI scripts.
PERL is an outgrowth of the UNIX csh and ksh scripting languages you may be familiar with. It is an interpreted language, and among its other uses it is very suitable for writing the kinds of scripts that used to be written in csh and ksh. PERL scripts are also in principle portable beyond UNIX, though that in fact depends upon how you write them. PERL is much more elegant, intuitive, and pleasant to write and read than csh or ksh, and has some very clever features that can make the meaning of the code quite transparent (if you speak the lingo). In addition, it has a very nice Object Oriented extension that I found to be absolutely essential in writing the QA CGI scripts. The overhead to learn OO programming in PERL is rather modest.
I found two books to be indispensable in learning PERL and writing PERL scripts, both published by O'Reilly (the "In a Nutshell" people):
- Programming PERL, by Wall, Christiansen and Schwartz
- The PERL Cookbook, by Christiansen and Torkington
The first is the standard reference (not quite a language definition) with a very useful pedagogical introduction, whereas the second contains explicit recipes for many essential tasks that would take a lot of time to figure out otherwise. Using it is a good way to learn the PERL idioms, I think. Surprisingly, the "PERL in a Nutshell" book was not as useful as these books.
An extensive PERL module has been developed for writing CGI scripts. It is called CGI.pm, written by Lincoln Stein at Cold Spring Harbor Lab, just down the road from BNL. I also found this to be extremely valuable: it hides all the html details behind a very convenient interface, allowing you to, for instance, generate quite complex tables in a few program lines. The scripts are very much cleaner and have fewer bugs as a result. The CGI.pm web page gives a tutorial and extensive examples. There is a book by Stein called "Official Guide to Programming with CGI.pm", published by Wiley. I found it to be of less use than his web page.
Modification History

webmaster

STAR QA for Offline Software

Index

Introduction

What is QA in STAR?

Varieties of QA in STAR

Overview of autoQA framework

Database Content and Operations

What is the QA database?

QA database updates

Starting Display and Data Selection

Viewing Run Information

Data Set

Created/On disk?

Run status

QA status

Run Details

QA Details

Files and Reports

Compare Similar Runs

Automated QA and Automated Tests

Automated Tests

Details of Control and Macro Definition files

Adding new macros

Current scalars and tests

Expert's page

PERL, Object Oriented PERL, and CGI.pm

Modification History