Offline QA technical documentation (old)

Last Update: 29 April 2005
(Inserted into Drupal on 30 Mar 2010, expect links to be broken!)


Purpose:
   This page provides nuts-and-bolts level documentation of the QA system for maintainence and troubleshooting through run 5.  Nothing here is necessary for those doing QA shift work, this page is only for those maintaining the browser software.

 

Quick Useful Links:

 

QA System Components

 

CGI Web Interface

Overview:  This is the main program of the QA system.  It queries the production database, requests the histograms to be made (via the daemon), stores information in the QA database, and provides a cgi web interface for the whole thing. 

Herb Says:  This is a C program (or, more precisely, an image of a C program). The
code for it is in a file named qa.c, a copy of which is under the starqa
RCF account home directory.  In the same place you will find a make file
which installs the program in an AFS CGI directory accessible by the STAR
WWW server.  This program has two inputs: the database described as "part
D" above and below, and the user (eg, a QA shift worker).  A link to the
CGI program is on the STAR QA page, and it is how the QA shift worker
enters the program.


Details:  The source tree for the code is on RCF at ~starqa/QA, the files are qa.c and a few functions in lib.h and lib.c.  A Makefile is also available in the same directory which compiles and installs the program on STAR's web server. 

qa.c is 2,000+ lines of string manipulation, html output, and mysql queries.  To find (some of) the code for a certain page, check the switch statement in main (line 2025) for the function call pertaining to a given page number.  The mysql queries are mostly handled through the Query function defined in lib.h and lib.c.  When all the cuts and selections have been made, this program adds a line to the QA DB with the request for the daemon to create the histograms.  When the daemon is finished it flags that entry, so the QA DB is monitored waiting for the daemon's response.  When the daemon is done, URL's pointing to the gzipped postscript files of the plots are sent to back the user for evaluation.

Source:  qa.c  lib.h  lib.c

Web Interface:  Some documentation is available here: QA Browser Documentation (Run 4).  To play with it yourself the CGI page is here: QA Offline Browser.  The sections for shift work of real data  (1.1) and fastoffline data (3.1) should work, but most of the other stuff doesn't.  Its not clear if it should be fixed or removed...


Notes: 

  • qa.c is also compiled with lib.h and lib.c.  One thing that looks odd in the code is from a definition in lib.h:
#define PP printf(
This will cause emacs to panic about unmatched parentheses.   (This was fixed by Jerome)

 

Daemon

Overview:  The daemon watches the QA DB for a request to generate histograms from certain run data.  When it sees a request it launches a ROOT macro to generate postscript files and flags the DB when finished.

Herb Says:  This is not actually a daemon in the full technical sense.  It is just
a program runs continuously, and which is kept alive through reboots by a
cron job, under the starqa account on the RCF machine rcas6004.  It
watches the database (part C) for requests from part A.  When it sees a
request, it generates the requested histograms and, when it is done, it
notifies part A through the database (part C). 

Details:  The code and Makefile is available on RCF at ~starqa/QA/daemon.c.  The daemon runs on rcas6004 (which is reserved) under the starqa account, and is started from a cron job, and apparently needs to be manually restarted often. 

The daemon is started through a wrapper at ~starqa/bin/QAdaemon.  This checks to see if the daemon is already running and handles all of the AFS authentication.  Also, the wrapper handles whether or not the storing the output will be stored to a log file; see the comments at the end of the file.

Since the QAdaemon wrapper now handles the log files, the cron job becomes much simpler.  The crontab command is:
* * * * * /star/u/starqa/bin/QAdaemon
which keeps the wrapper alive continuously. 

For debugging purposes it can be useful to run the daemon interactively, this should still be done through the wrapper.

Source:  daemon.c  QAdaemon (wrapper)

Procedures:


Starting the daemon interactively:
  • log on to the rssh gateway (with your own account)
  • log on to rcas6004 as starqa:  ssh starqa@rcas6004
  • run bin/QAdaemon
  • after running, delete the file ~/QA/marker
Stopping the daemon:
  • log on to rcas6004 as starqa (see above)
  • get the process id:  ps -ef | grep starqa
  • kill the daemon's process
  • delete the file ~/QA/marker
  • note: unless you stop the cron job, the daemon should automatically restart shortly

Notes:
  •  The cron job:  use the "crontab -l" command to see the current command.  "crontab -r" will remove it,  "crontab -e" will let you edit it (defaults to vi, you can set your $EDITOR environment variable), or "crontab filename" will replace it with the command(s) in the file filename.  I keep the current cron command in ~/QA/mycrontab (check this before using). 
  • As a check against having several daemons running simulataneously, the daemon stores it pid in the file ~/QA/marker.  When a new daemon is started, it checks for this file, and may refuse to run if the file is recent.  The new wrapper makes this largely unnecessary, so this may be removed in the future.  In the meantime, if on startup the daemon quits with the error message about another process already running, try deleting the marker file and starting the daemon again.

 

QA Database

Overview:  This database stores information on which files have been analyzed by QA shift workers, stores requests for new histograms to be made, and is how the cgi script and the daemon communicate.

Herb Says:  This is a database.  Its name is "test_Herb".  It is served by one of
the STAR mysql servers, which you can find by searching the daemon or CGI
code for calls to the the "Query" function.

Details:  The database is named "test_Herb" and is served on duvall.star.bnl.gov. 

A (partial) list of tables in the QA DB:
  • hist2ps:  Requests from the cgi script to create histograms.  The daemon monitors this table for requests, and updates the entry when it has completed (or crashed).
  • fastOfflineHistory:  A history of who has examined what
  • reports:  Stores reports from QA shift workers
  • reviewed:  Lists job IDs and run numbers which have already been checked by QA.

Notes: 
  • To browse the database, you can use the mysql command line interface (mysqlshow would be preferable, but it can't get around the underscore in the database name).  Be careful that you do not make any changes
    • Type: 'mysql -h duvall.star.bnl.gov'  to open mysql
    • 'use test_Herb'  opens the QA DB
    • 'show tables'  lists the tables
    • 'describe <table>'  gives details about a certain table
    • 'select * from <table> limit 10'  will show 10 entries in a table
    • consult a mysql manual (e.g. here) for more...

 

STAR Production Database

Overview:  This is STAR's production database.  The cgi script reads this to get a list of which offline production jobs are available.

Herb Says:  This is another database.  Its name is "production" (note: actually named "operation").  This database is
written during production, and tells what jobs ("runs") have gone through
production.

Details:  The database is also served from duvall.star.bnl.gov, it is actually named "operation".  We are careful to only read from this database and not modify it in any way.

The tables we access are:
  • FileCatalog2004:  Real Data productions, this table is queried several times to choose a run and a job, then it takes the path and filename and sends them to the daemon.
  • DAQInfo:  FastOffline Data, list of files from FastOffline productions, the path is stored as an integer in the DiskLoc column which references FOLocations below.
  • FOLocations:  pathnames for FastOffline data, references DAQInfo table, matches integers to pathnames (i.e. 0 = not on disk, 1 = /star/data08/reco/dev/2005/01/, etc.)

Notes: 
  • Our access to this database was down for several weeks.  If data are not available in the browser, you can use mysql to directly query the operations database to see what is there and see which end the problem is on.

 

Troubleshooting

This will be a list of common problems and solutions.  However, the system has been remarkably stable so far.  Little outside intervention has been needed to keep the system working, so I haven't had anything to add to this section.  So far....