STAR QA for Offline Software

Under:
Peter Jacobs, LBNL
July 7, 2000

Index

  1. Introduction
    1. What is QA in STAR?
    2. Varieties of QA in STAR
    3. Overview of autoQA framework
  2. Database Content and Operations
    1. What is the QA database?
    2. QA database updates
  3. Starting Display and Data Selection
  4. Viewing Run Information
    1. Data Set
    2. Created/On disk?
    3. Run status
    4. QA status
  5. Run Details
  6. QA Details
  7. Files and Reports
  8. Compare Similar Runs
  9. Automated QA and Automated Tests
    1. Automated Tests
    2. Details of Control and Macro Definition fil es
    3. Adding new macros
  10. Current scalars and tests
  11. Expert's page
  12. PERL, Object Oriented PERL, and CGI.pm

  1. Introduction
    1. What is QA in STAR?
    2. The goal of the QA activities in STAR is the validation of data and software, up to DST production. While QA testing can never be exhaustive, the intention is that data that pass the QA testing stage should be considered highly reliable for downstream physics analysis. In addition, QA testing should be performed soon after production of the data, so that errors and problems can be caught and fixed in a timely manner.

      QA processes are run independently of the data taking and DST production. These processes contain the accumulated knowledge of the collaboration of modes of failure of data taking and DST production, along with those physics distributions that are most sensitive to the health of the data and DST production software. The results probe the data in various ways:

      • At the most basic level, the questions asked are whether the data can be read and whether the all the components expected in a given dataset are present. Failures at this level are often related to problems with computing hardware and software infrastructure.
      • At a more sophisticated level, distributions of physics-related quantities are examined, both as histograms and as scalar quantities extracted from the histograms and other distributions. These distributions are compared to those of previous runs that are known to be valid, and the stability of the results is monitored. If changes are observed, these must be understood in terms of changing running conditions or controlled changes in the software, otherwise an error flag should be raised. (Deviations are not always bad, of course, and can signal new physics: QA must be used with care in areas where there is a danger of biasing the physics results of STAR.)

      STAR will produce hundreds of terabytes of data each year. Meaningful testing of the DSTs produced from these data is a daunting task, entailing an enormous amount of tedious repetition. This process must be automated to a very high degree, for reasons both of reliability and finite capacity of even the most dedicated grad student to do boring but important things. The web pages you are looking at are part of an automated framework for QA and testing, called autoQA.

    3. Varieties of QA in STAR
    4. The focus of the QA activities until summer 2000 has been on Offline DST production for the DEV branch of the library. With the inception of data taking, the scope of QA has broadened considerably. There are in fact two different servers running autoQA processes:
      • Offline QA. This web page accesses QA results for all the varieties of Offline DST production:
        • Nightly tests of real and Monte Carlo data (almost always using the DEV and NEW branches of the library). This is used principally for the validation of migration of library versions
        • Large scale production of real and Monte Carlo data (almost always using the PRO branch of the library). This is used to monitor the stability of DSTs for physics.
      • Online QA (old). This web page accesses QA for data in the Online event pool, both raw data and DST production that is run on the Online processors.

    5. Overview of autoQA framework
    6. The autoQA framework consists of a set of CGI scripts written in PERL, which perform the following tasks:
      1. Data Catalogue: Maintain a database of all production and test datasets of real and MC data, together with a performance summary of each run: what was run, completed successfully or not, run time errors and diagnostics, memory and cpu usage, etc. New runs are added to the database by querying the Files Catalog in Offline and the Event Pool summaries in Online. The update queries occur automatically via cron jobs, with a frequency that is dependent upon the kind of data in question: they will be very frequent for Online data (say, every 10 minutes), less so for the nightly tests of MC data (say, once a day). These are parameters that will be adjusted as we gain expeience with the system and how it is used.
      2. Automated running of QA macros: Run a set of QA ROOT macros on the dataset, report the results, and catalogue them in the database. The QA macros generate postscript files of histograms or ascii files containing scalars relevant to QA, both event-wise and run-wise scalars. The specific macros to be run may depend upon the nature of the data being analysed (real vs simulated, cosmic, calibration, etc.)
      3. Examination QA macro output: The autoQA-generated web page facilitates access to the histograms and scalars resulting from running the QA macros. In addition, the comparison of different runs is possible by building comparison tables of scalars, which for instance allow the user to track physics-related qunatities (number of primary tracks, V0s, etc) of similar runs as a function of time.
      4. Automated QA Evaluation: Following the running of the QA macros, the autoQA system can run a set of defined tests on the scalars generated by the QA macros, highlight scalars that fall outside of expected ranges ("cuts") by raising error and warning flags, and record the results in the QA database. The tests that are applied can depend upon the nature of the data being analysed, and the specific cuts of the tests can vary as needed. Which tests and what cuts to apply to a given data set are quite complex questions. If done well and kept current with the data being examined, this facility can serve as a reliable automated mechanism to validate the data, which is the ultimate goal of autoQA. If not done well, this facility can be misleading and not catch critical errors. Thus, for the time being (summer 2000), no automated tests will be applied to data generated in large scale production of real data. Once we gain experience with the system and understand how to test for stability, we will (slowly) introduce automated tests. Until that time, QA decisions will have to be made entirely by humans (that means you) looking at histograms and the time development of scalar quantities.

      The autoQA-generated web pages present these data in a heirarchical fashion, with the most important information most prominently displayed. Go to the upper frame window and choose a data class from the pulldown menu in the banner. The resulting selection menus will depend upon the data class chosen, and correpsond closely with the Offline File Catalog that is maintained by the production team. Using the pulldown menus, choose an interesting subset of all catalogued data and press the "Display Datasets" button. The datasets available in the QA database satisfying the selection are listed in reverse chronological order, with details about the run submission, status of the data on disk, and a very brief performance summary given in the first three columns. The "QA Status" and buttons on the right hand side are described below.

      The scalars and histograms are generated by ROOT macros running in the standard STAR framework. Description of the QA macros run in Offline can be found here. (July 8,2000: Online macros still to be defined.) The developers responsible for the macros can be found on the STAR QA links and contacts. The autoQA cron jobs automatically run thes macros and submit them as batch jobs to RCAS under LSF for Offline, and as daughter processes on the Online cluster for Online.

      The framework has been written so that the addition of new macros is straightforward. No changes to the CGI scripts are needed to introduce new macros which produce postscript files. A single PERL subroutine needs to be added for a new macro which generates an ascii file of scalars which parses the file, extracts the QA scalars and puts them into some defined PERL structures.

      Two kinds of QA scalars are defined: run-based and event-based. The run-based scalars characterize the run as a whole (for instance, the mean, rms, minimum and maximum number of tracks per event in the run). The event-based scalars characterize each individual event (the number of TPC hits in each event, is such-and-such a table is present in this event, etc.) As has been pointed out by a number of people, the "scalars" may also be the result of statistical tests (such as a chisquared or Kolmogorov test) comparing a histogram from the selected run to a reference histogram.

      In addition to running QA ROOT macros to generate the scalars and histograms, the Offline Software QA framework can apply Boolean tests to an arbitrarily large set of scalars generated by these macros. (This is defined above as Automated QA Evaluation.) These tests will be of greatest use in probing the self-consistency of large scale production, but can also be used to detect changing conditions in the nightly and weekly test results. The results of all tests for each QA macro applied to a given run are summarized in the run listing table under "QA Status" . Details about the scalars and tests can be displayed via the "QA details" button (explained further below). are especially welcome.

      The time dependence of QA scalars can be viewed via the "Compare similar runs" button. The question of what data to compare meaningfully is non-trivial for real data, especially if multiple triggers are present in a single run (for Monte Carlo data the comparisons are more straightforward). This facility will undergo changes as we gain experience. An important future extension of this will be to develop ROOT macros to superimpose histograms from a reference run on the QA histograms for each selected run.

      Functionality that modifies the QA database entries (performing updates, running the QA macros, etc.) is hidden in a password-protected Expert's page.


  2. Database Content and Operations
    1. What is the QA database?
    2. The QA database is a MySQL database containing all datasets that have been registered by autoQA. The QA database utilizes a supplementary "disk-based DB", a UNIX directory tree containing the files generated by the ROOT macros and the various autoQA processes. Each dataset is assigned a unique database key, which serves as the name of the subdirectory containing all files related to this run. There are several types of files in each subdirectory. The casual user need not know anything about these files: an important function of the CGI scripts is to present their contents to the user's web browser in digestible ways. However, for completeness, the files are:
      • logfile_report.obj: summary of the run, generated by parsing the log file when the run is entered into the database. It is present for all runs in the database. Format is internal to PERL, not human-readable.
      • StError.txt, StWarning.txt: ascii files containing all strings written to log file by StMessageMaker within bfc. These are currently filled with many messages besides important errors and warnings and a general cleanup of user code should occur for this facility to be more useful in flagging real errors.
      • files of type .qa_report: Ascii file generated by each QA macro that produces acsii output of QA information. Filename is name of the macro.
      • files of type .ps.gz: Gzipped versions of postscript files generated by QA macros such as bfcread_hist_to_ps. Filename is name of macro. Links to these files are presented to the browser when QA Details are viewed.
      • files of type .evaluation: The result of Automated QA Evaluation applied to the qa_report output of one QA macro. Format is internal to PERL, not human-readable.

    3. QA database updates
    4. Updating the QA database is carried out manually from the Expert's page or by means of a cron job (see your favourite UNIX manual). The updating process examines the Offline File Catalog (for Offline) or the Event Pool (for Online), looking for datasets that have not yet been catalogued. Date and time of the last update are given in the upper panel, under the "Display selected dataset" button. If an update job is in progress, blue text will indicate that immediately below this message. Frequency of update will depend upon the class of data, and will vary from once every few minutes (for Online) to once a day (for nightly MC tests).

  3. Starting Display and Data Selection
  4. Selecting a data class in the web page banner generates a set of additional pulldown menus, in general dependent upon the data class, which specify the various subsets of data available. These additional pulldown menus are used to form a query to the QA Database, which is submitted by pushing the "Display Datasets" button. Upon return, all datasets in the QA database that satisfy the DB query are displayed in the upper frame.

    The Expert's Page button generates a request for the expert's pw, which in turn will display numerous buttons that control and change the DB, launch batch jobs, etc.

    The button labelled "Add or Edit Comments" generates a request for your name. It will enable buttons associated with individual datasets, along with a button labelled "Add global comment". You will be able to enter comments that will be interspersed with the dataset listings. The global comments will have a specific date and time and will appear in chronological order with the datasets. These allow the user to indicate library changes, specific problems associated with a given dataset, etc. Usage of the Comment feature is quite simple and (hopefully) self-evident.

    RCAS/LSF monitor: this is a link to a general utility monitoring all LSF activity on RCAS. It is a PERL/CGI wrapper around LSF commands such as "bsub" and "bpeek". Only expert users will only be able to manipulate LSF jobs,and then only jobs submitted by user "starlib".


  5. Viewing Run Information
  6. The dataset display in the upper frame has five columns, labelled "Data Set", "Created/On disk?", "Run Status", "QA Status", and an unlabelled column containing action buttons which are described in detail in following sections. This display can be refreshed at any time by pushing the "Display Datasets" button.

    1. Data Set
    2. Displays basic information about the dataset, such as Job ID (for DSTs), where the DST resides (or resided) on disk, the STARLIB version and STAR Library level, as extracted from the log file, etc.

    3. Created/On disk?
    4. Displays the date and time of submission of the run and whether it resides on disk at the moment.

    5. Run status
    6. Displays basic performance-related information about the run, extracted from the log file. Reports whether the run completed succssfully or crashed, error conditions such as segmentation fault, the number of events completed, the number of events requested, and the specific event sequence requested from the input file. Reports whether all data files that should be generated by the production chain are present in the directory. (The file of type .hist.root might exist but not contain the relevant histograms, which is indicated by its size being much smaller than a correct .hist.root file. This condition is also reported as a missing .hist.root file.)

    7. QA status
    8. If a dataset has been added to the catalogue but the QA macros have not yet been run, this column displays "QA not done". Otherwise, displays the date and time that QA was carried out, and succint information from each QA macro that was run. Reports if macro crashed (this should be rare). If Automated QA Evaluation has been done, reports "o.k." if all tests passed, otherwise reports number of errors and warnings generated.

      If QA for this run has been initiated but not completed (from the Expert's page), blue text will indicate that a QA batch job is in progress.For Offline, a link will be given to "peek" at the batch job in progress. If the batch job has completed, a link will be given to the log file from the batch run (useful for diagnosing macro crashes).


  7. Run Details
  8. Shows detailed run information extracted from the log file: run setup information, including the options selected, CPU usage, and memory usage

  9. QA Details
  10. This button displays in the lower frame links to histogram files generated by various QA macros. The format is gzipped postscript and the name of the file indicated the macro that generated it. Physical location of the files is given for reference, but clicking on the link will open the file. This should open quickly even at remote sites. Your browser should be configured to open .ps files using a recent version of ghostview that automatically gunzips a gzipped file.

    The generated scalars and results from the Automated QA Evaluation can be displayed in a separate browser window (by pushing a button with an obvious label, situated below the listing of the ps files). There is a large volume of information to present here, of varying general interest and importance. The run-based scalars tend to be more important for physics QA than the event-based scalars, and so are highlighted in the order of display. QA tests that fail are likewise highlighted over those that succeed. The display is divided into several sections:

    1. Run-based scalars, errors and warnings: Run based scalars (see overview) are presented for each QA macro for which they are defined, together with the automated tests that failed and generated an error or warning. The tests are labelled by short strings and are defined in more detail farther down the display. See current scalars and tests.
    2. Event-based errors and warnings: Same as previous section, but for event-based scalar tests that generated errors and warnings. The actual scalar values are not given here. Event-based scalars are tabulated for each event and there may be many of them in total. Their values can be found in the tables of all QA tests applied, farther down the display.
    3. Run-based tests (all entries): displays all QA tests applied to run-based scalars for each macro. Display shows each boolean test string, severity if failed (error or warning), and result (TRUE or FALSE). Failed tests are highlighted in red.
    4. Event-based tests (all entries): displays all QA tests applied to event-based scalars for each macro. Display shows each boolean test string, severity if failed (error or warning), and result (TRUE or FALSE). Failed tests are highlighted in red.

    5. Files and Reports
    6. The table shows all files in the given production directory, together with their size and date of creation.

      The remaining sections display:

      1. Logfile: a link is given to the log file for the run. Check the size of the logfile before opening it: the largest log files can exhaust the virtual memory of your PC.
      2. StWarning and StError files: ascii files containing all instances of StWarn and St Error in the log file.
      3. Control and Macro Definition files: links to the specific control and macro definition files used for QA. Physical location of the files is given for reference, but clicking on the link will open the file. These files define the QA macros to be run, the run and event based scalars to extract, and the automated QA tests and specific cuts to apply. Each run has one control file and one or more macro definition files, which may be valid only for a specific event type (central collisions, cosmics, etc.), time period, or sequence of library versions
      4. Postscript files: links are given to all postscipt files generated by the QA macros. These are the same as the links given under "QA histograms" on the "QA details" page.
      5. Other files: shown only on the Expert's page. All files (other than of type .ps.gz) that are in the QA Database subdirectory for this run are displayed. Links are provided to ascii files (this excludes files of type .evaluation).

    7. Compare Similar Runs
    8. This display is proving to be fruitful for QA, but see the warning in the Overview section concerning the difficulty of defining which runs to compare meaningfully for real data (as opposed to Monte Carlo data). The run-based scalars of the current run are presented in a table with those of other runs, to investigate their time dependence.

      The user is first given the option of comparing to multiple similar runs, or comparing to a predefined reference. The latter capability is not yet implemented, however, and the user will be redirected to the former. For nightly MC, "similar" currently means the same TPC simulator (tfs, tss or trs) and geometry (year_1b or year_2a). For real data, the selection criteria are not yet established (July 8, 2000).

      After clicking on "Compare to Multiple Reports", the display in the lower frame shows all catalogued runs that are "similar" to the current run (which is listed first and labelled as "this run"), with check boxes to select the comparison runs. Multiple comparison runs are permitted, and judicious selection can give a useful display of the time dependence of the scalars. After selecting the comparison runs, push "do run comparison".

      All comparisons runs are listed in a table and are assigned an arbitrary letter label for convenience. The remaining tables show a comparison of run-based scalars for each of the QA macros that was applied to the selected run, first the difference in value of each scalar relative to that for the selected run, and then the absolute values themselves (these tables obviously display the same information). Comparison runs with no valid entries for a given scalar value (macro wasn't run or it crashed) do not appear. If only some entries are valid, the remainder are given as "undef".

      For convenience when writing the QA summary reports, the tables in this display are also written to an ASCII file, whose name is given near the top of the display.

      In the near future a histogram comparison facility will be developed, automatically plotting the QA histograms for the selected run and one reference run on the same panel.


    9. Automated QA and Automated Tests
    10. In this documentation, I have tried to use the following definitions consistently (see also overview):
      • Automated QA: Running a set of QA root macros on the production files.
      • Automated Testing: Apply a set of cuts to the QA scalars generated by these macros.
      While these are separate, sequential operations in principle, in practice in the current autoQA framework they are specified together. After a discussion of automated tests, I will discuss the steering mechanism and how both the QA and testing are specified for a given macro.

      The appropriate set of tests to apply, and in the case of numerical comparisons, the actual values to compare to, are often dependent upon the specific class of event under consideration. Simulated events from event generators require different tests than cosmic ray events, and there will of course be many different classes of real data with different properties. The selection of the appropriate set of QA macros and tests to apply to a given run is done by means of a "Control File", which specifies "Macro Definition files", one for each QA macro to be applied. Each event class has a Control File. The detailed format and syntax of these files is discussed below.

      1. Automated Tests
      2. Examples of tests on QA scalars are:
        • is Table X present in each event on the dst?
        • are there tables present in any event that are not expected for this class of data?
        • is the number of entries for Table X equal to those for Table Y in all events?
        • is the mean number of tracks per event within a given window for this run?
        There may be tests on both run-based and event-based quantities. In some cases a simple binary quantity in tested (table present or not), in other cases a numerical comparison is made. The section on QA Details describes how to view the results of the QA tests.

      3. Details of Control and Macro Definition files
        • Control File: This file contains a set of names of Macro Definition files, one per line. Lines beginning with a pound sign ("#") are comments and blank lines are ignored. All other lines are understood to contain a file name, which should begin in the first column. As an example:
          #
          # control file for year 2a nightly MC tests
          #
          # lines with "#" in first column are comments
          # all other non-blank lines should contain a test file with full pathname for one macro
          #
          #--------------------------------------------------------------------------------
          /star/rcf/qa/nightly_MC/control_and_test/bfcread_dstBranch/hc_std.year_2a.v1.test
          /star/rcf/qa/nightly_MC/control_and_test/bfcread_eventBranch/hc_std.year_2a.v1.test 
          /star/rcf/qa/nightly_MC/control_and_test/bfcread_Branch/geantBranch.year_2a.v2.test
          /star/rcf/qa/nightly_MC/control_and_test/bfcread_Branch/runcoBranch.year_2a.test
          /star/rcf/qa/nightly_MC/control_and_test/bfcread_Branch/tagsBranch.v2.test  
          /star/rcf/qa/nightly_MC/control_and_test/bfcread_hist_to_ps/QA.test
          etc.
          
          is the current control file for nightly MC test runs with year_2a geometry.
        • Macro Definition files: Here is (part of) the file QA_bfcread_dst_tables.year_1b.v1.test:
          macro name: $STAR/StRoot/macros/analysis/QA_bfcread_dst_tables.C
          macro arguments: nevent=1 infile outfile
          input data filetype: .dst.root
          first starlib version: SL99a
          last starlib version: SL99z
          macro comment: Test of First Event Tables
          end of header:
          run scalars: globtrk globtrk_aux globtrk2 primtrk primtrk_aux 
          run scalars: vertex dst_v0_vertex ev0_eval dst_xi_vertex kinkVertex
          run scalars: particle
          run scalars: point dst_dedx g2t_rch_hit
          run scalars: TrgDet
          run scalars: event_header event_summary monitor_soft
          BEGIN TEST:
          run test name: Table exists
          test comment: Check that table is present in first event
          error: globtrk .found.
          error: globtrk_aux .found.
          ... some text removed ...
          error: dst_dedx .found.         
          error: monitor_soft .found.             
          END TEST:
          BEGIN TEST:
          run test name: Row in range
          test comment: Check that number of rows within reasonable range
          error: globtrk .gt. 8600
          error: globtrk .lt. 9000
          error: globtrk_aux .eq. globtrk
          ... some text removed ...
          error: monitor_soft .eq. 1              
          error: dst_dedx .eq. globtrk            
          END TEST:
          BEGIN TEST:
          run test name: Unexpected Tables
          test comment: Check for unofficial tables present (not declared as test scalars)
          error: nonscalar .notfound.
          END TEST:
          
        Files of this type can be seen by following the appropriate link under "QA details".

        The file is divided into:

        • a mandatory header, which defines the macro name, arguments, input data filetype, valid starlib versions (not yet checked against), and a further comment that will be displayed on the "QA Details" page. In the macro name, note the presence of the environment variable $STAR. If this is given rather than an absolute path name, the macro will be run under the same library version as the production was run (the version is extracted from the log file).
        • an optional declaration of expected scalars. In this case only run-based scalars are declared, but event-based scalars can be declared in a similar way. The appropriate scalars must be declared if a test is defined.
        • optional test definitions. Tests can be "run tests" (for testing run-based scalars) or "event tests" (for testing event-based scalars). Test name is mandatory, comment is optional. The Boolean tests are given one per line. A line such as
          error: globtrk .lt. 9000
          
          is understood as testing that the scalar value "globtrk" (the number of global tracks in the first event) is less than 9000; if it is not, an error is reported. In other words, the format is severity: string, where a failure (with the given severity) is reported if the string is false.

        There are special cases built in (such as the scalar nonscalar appearing in the last test of the example), some of which are proving to be more useful than others. I will not try to give a full specification of the test language here - that would quickly become obsolete. This "metalanguage" is of course defined by the QA perl scripts themselves, and it will change and develop as needed (insert the usual boilerplate about backward compatibility here). If the existing examples are not sufficient for your application, you should contact me, but if you are at that level of detail, you probably already have done so.

      4. Adding new macros
      5. The steps currently needed to add a new macro are:
        • Create the appropriate macro definition files for the new macro.
        • Modify the control files for the relevant event types.
        • If scalars are defined, modify the PERL module QA_macro_scalars.pm by adding a subroutine with the same name as the macro. This subroutine should parse the report file generated by the macro (file of type .qa_report) and return references to two hashes containing the scalars (this is PERL-speak):
          sub new_macro_name{ 
            %run_scalar_hash = ();
            %event_scalar_hash = ();
          ... do some stuff...
            return \%run_scalar_hash, \%event_scalar_hash;
          }
          
          A minimal example is sub QA_bfcread_dst_tables in QA_macro_scalars.pm, but if you don't understand PERL, you won't understand the example. A more complex example is sub doEvents in the same file, but the complexity of this routine is driven by the comparison of QAInfo reported to STDOUT by doEvents to the same lines in the logfile for the same events, and this is not applicable to other macros.

        The PERL scripts have been almost completely decoupled from the specific content of the existing QA macros, making the addition of new macros to the QA processing rather easy. As noted, the only PERL programming needed to add a new macro is the addition of a single subroutine to QA_macro_scalars.pm, and only in the case that scalars are defined (i.e. a new macro that only generates a postscript file of histograms does not require any change to the PERL scripts, only to the relevant Control and Test files).


    11. Current scalars and tests
    12. The current (Sept 99) scalars and Automated QA tests that are applied are for each macr o are:
      • doEvents:
        • Run-based: Run-based scalars are quantities such as tracks_mean, tracks_rms, tracks_min and tracks_max, which are the mean and rms of the number of tracks per event, and the minimum and maximum per event over the run. See a Macro Definition file for doEvents for the full list. There is one test on these scalars, called Run stats in range, which checks each scalar against expected values.
        • Event-based: No event-based scalars are defined. However, an event-based test called Compare to Logfile, that is particular to doEvents, is defined. This test checks the strings written by the message manager in doEvents ("qa" = "on") against those written in the same event to the logfile during production. The test is that the number of tracks, vertices, TPC hits, etc., reported during production are the same as those read from the DST by doEvents, verifying the integrity of the DST and the reading mechanism. You will be pleased, but not surprised, to learn that no errors of this kind have been detected, but we will continue to check.
      • QA_bfcread_dst_tables: QA_bfcread_dst_tables is Kathy's standard QA macro, which reports the number of rows for each table in the first event of the run. These scalars are defined as run-based for the QA tests. Tests on these scalars are:
        • Table exists: checks that all tables that are officially expected for a given class of event (year_1b vs. year_2a) are present in the first event
        • Row in range: similar to test "Run stats in range" in doEvents, and highlights same problems. Checks that the number of rows for each table in the first event of the run is within some window or equal to the number of rows of another table.
        • Unexpected Tables: checks that there are no tables in the first event that are not officially expected.

    13. Expert's page
    14. The Expert's page is a password protected page, containing the standard display plus functions that affect the content of the database. I will not detail all the functionality in the Expert Page here. If you are expert enough to want the Expert's password, you will have contacted one of the developers and found out how to determine the functionality of the various buttons yourself.

    15. PERL, Object Oriented PERL, and CGI.pm
    16. "CGI" stands for "Common Gateway Interface", and refers to the standard internet protocol for dynamically generating web pages by running "CGI scripts" which respond to user actions. When you purchase a book from Amazon.com, the web server is running a CGI script that responds to your search and purchase requests and sends your purchase details to literally hundreds of web-based merchants, who then target their advertising banners straight at you. CGI scripts can be written in various languages, but PERL is a well established industry standard for writing CGI scripts, is freely distributed, has extensive documentation, support and software available over the web and from the standard sources, and appears to be the right choice for writing the QA CGI scripts.

      PERL is an outgrowth of the UNIX csh and ksh scripting languages you may be familiar with. It is an interpreted language, and among its other uses it is very suitable for writing the kinds of scripts that used to be written in csh and ksh. PERL scripts are also in principle portable beyond UNIX, though that in fact depends upon how you write them. PERL is much more elegant, intuitive, and pleasant to write and read than csh or ksh, and has some very clever features that can make the meaning of the code quite transparent (if you speak the lingo). In addition, it has a very nice Object Oriented extension that I found to be absolutely essential in writing the QA CGI scripts. The overhead to learn OO programming in PERL is rather modest.

      I found two books to be indispensable in learning PERL and writing PERL scripts, both published by O'Reilly (the "In a Nutshell" people):

      • Programming PERL, by Wall, Christiansen and Schwartz
      • The PERL Cookbook, by Christiansen and Torkington
      The first is the standard reference (not quite a language definition) with a very useful pedagogical introduction, whereas the second contains explicit recipes for many essential tasks that would take a lot of time to figure out otherwise. Using it is a good way to learn the PERL idioms, I think. Surprisingly, the "PERL in a Nutshell" book was not as useful as these books.

      An extensive PERL module has been developed for writing CGI scripts. It is called CGI.pm, written by Lincoln Stein at Cold Spring Harbor Lab, just down the road from BNL. I also found this to be extremely valuable: it hides all the html details behind a very convenient interface, allowing you to, for instance, generate quite complex tables in a few program lines. The scripts are very much cleaner and have fewer bugs as a result. The CGI.pm web page gives a tutorial and extensive examples. There is a book by Stein called "Official Guide to Programming with CGI.pm", published by Wiley. I found it to be of less use than his web page.


    17. Modification History


    webmaster
    Last modified: Sat Jul 8 17:42:20 EDT 2000