bfcca

Under:

This document describes the arguments for bfcca located in $STAR_SCRIPTS, a production script interface to the CRS reconstruction software and bfc.C. Please, refer to this page for the syntax of a job description ; this document will assume you are accustom to the syntax and will only describe the features implemented in bfcca itself.

bfcca is a script interface which handles running a chain, running a calibration pass and/or a chain and moving data to its final location. It includes features such as writ ting to local disk during run, log file compression, displaying information about the node the job runs on etc ... and will later be extended with automatic insertion of information in a FileCatalog.

 

Module requirements

This perl wrapper requires a few perl module, most of which are installed by default unless specified otherwise in the below list. If anything goes wrong with loading a module, please, check that you are using the proper perl version and have the modules installed.

use File::Basename;
use File::Copy;
use Sys::Hostname;
use Digest::MD5;

Digest:MD5;                             this one is to be checked and is not installed by default ...

General options

The arguments of this script are as follows:

  1. int An option-mask
    The mask has the following meaning

    bit

    Component

    Meaning

    1

    IO mode

    0 regular IO (immediate/un-buffered)
    1 delayed STD IO (+1)

    2 Tag file copy  0 No copy
    1 Copy file to /star/data20/tags (+2)

    3

    Code Optimization

    0 non-optimized
    1 optimized (+4)

    4-5

    Compression

    00 uncompressed STDOUT/STDERR
    01 compressed STDOUT (+8)
    10 compressed STDERR (+16)

    6-7 Copy files to DD
    (XRootD)

    00 Not going to copy any file to DD
    01 only picodst file will be copied to DD (+32)
    10 only mudst file will be copied to DD (+64)
    11 both, picodst and mudst files will be copied to DD (+96)
     
    8 32-bit vs. 64-bit
    0 32-bit
    1 64-bit (+128)
     
    Since this variable is a bitmask, you need to sum the options altogether. For example, if you want to chose delayed IO mode (+1) along with running in optimized mode (+4) and have the output compressed for both STDOUT (+8) and STDERR (+16), your final option is therefore 0+1+4+8+16=29.
     
  2. string a libVersion
    Can be one of dev, new, pro, old or any version understood by the starver command (for example, SL01l or 01l are both valid. There is a second syntax for the library version using some special values :
    1. cal. This pseudo-version will make the script to use what is compiled in the test-calibration area /afs/rhic.bnl.gov/star/packages/cal in conjunction with the dev area.
    2. An extension and a more general syntax for this mode is the use of Version-cal . This syntax allows to use the cal area in conjunction with ANY library Version. However, the Version in this case MUST be a fully specified library version. For example, SL02b-cal is valid but new-cal is NOT valid.
       
  3. string a destination path A path where the produced output stream have to be moved. Note the following side effects :
    outputstreamtype outputdir destination Effect
    UNIX . i.e. local directory . Output streams will will be ignored and lost
    UNIX . PATH Output streams will be created on local disk and then moved to the final destination $PATH
    HPSS HPSSPATH . or ./ The choice of a local directory for $PATH will lead to ignore moving the outputstream. The files will be saved in HPSS only.
    HPSS HPSSPATH PATH Output streams will be copied to HPSS in $HPSSPATH, in $HPSSPATH the base string /home/starreco will be substituted by $PATH. This means that $PATH only needs to specify a disk name (such as /star/data18). The directory thereafter /home/starreco will be preserved as-is on disk

    The PATH syntax may be specified as a range. A range specification has one restriction: it has to follow the following syntax (which may happen at any place in the specified string): ANY(character)+"+"+#number+”-”+#number+ANY(character). For example, the following path specifies ranges
    /star/data+14-15 : Any disk between /star/data14 to /star/data15 will be picked
    /star/data+23-27/reco : the number may appear anywhere and the final path will be any of /star/dataXX/reco where XX stands for a number between 23 and 27.
    The algorithm is a random picking of a disk with occupancy below 98%. If this threshold criteria is not met, the disk with the largest available space is chosen. Note that the destination choice is made only at the end of the processing.
     
  4. A number of event description. This input takes 2 form :
    (1) int the number-of-events to run 0 or -1 both have the meaning of all events i.e. bfcca will internally set the number of events as being large so the chain will cover all events recorded in a daq file.
    (2) int1-int2 (i.e. With a dash in between) will tell bfcca to run bfc.C from event int1 to event int2.
     
  5. string the chain-option(s)
    a list of chain options ...

Later, we needed to extend this wrapper with some special features so the input (5) has been reshaped to accept special characters triggering special treatments. In those cases, all of the other options are shifted. It is better explained in the examples below but for now, let's review the possibilities in the next table:


Special Character Mode Expected arguments
 / Alternate script mode The next argument after "/" will be a script to run instead of bfc.C, all subsequent arguments will be passed to that script as is. Note : string arguments will passed stringified. You should NOT add quotes around strings. Short example:
25,dev,/star/data05,-1,/,StMuDstMaker.C,0,1,1000, st_physics_2270049_raw_0088.event.root
The disadvantage of this mode is that you have to pass the arguments known a-priori (which breaks the philosophy of a job description)
+ Alternate script mode with periodic input The next argument after the "+" will be an input periodicity or modulo. What it will do is call the script with all of the other argument but the last which will be replaced by the input given as per the job description. In other words, it will loop on all inputs and call the script separately for each call. The periodicity is used to take one input every 'period'.
For example, if your chain needs to only read and scan over an event.root file BUT requires the presence of a tags.root file, the periodicity you would use is 2 .
@ Alternate script with fixed generic output filename. In this mode, arguments before the "@" are passed as expected and must be specified as usual .
After "@", the expected arguments are the name of a script to run instead of bfc.C following by an argument which will be used as the very last argument of your script. Typically, this could be a an output filename which will be passed as the last argument to the script.
An example is given in the below example list ...
~ Addiitonal setup scripts may be run. This mode is intended for passing additional setup commands to bfc before running the jobs. All arguments until the next "~" will be considered as setup options. In addition, arguments will have any "-" replaced by space. The chain options will resume normally afterward. An example is provided below.

Examples :

  • 1,dev,/star/data13/reco/dev/2001/11,-1,P2001a
    The option-mask is 1 i.e. the first bit is at 1 that is the standard outputs will be used in delayed mode (the output will appear only after the job ends).
    The libVersion is dev which means that we will be running from stardev
    The destination is a fully specified path. Without looking at the full job description, difficult to see whatit will do. In this job, the outputdir were chosen to be . which means that first, the files will be produced on the reconstruction farm local disk and only after, moved to the exact location /star/data13/reco/dev/2001/11
    number-of-events is -1 so all events from inputfile will be processed
    The chain-option is PP2001

  • 1,eval,/star/data13/reco/dev/2011/XX,-1,~,gcc-4.51,~,P2011
    This example is indentical to the previous one in nature except that the "~" mode is used. Before executing the chain P2011 and switching to the eval area, the command "setup gcc 4.51" will be executed, changing the compiler version and envrionment to be used. Please, be sure to verify which setup command exists in STAR.

  • 5,SL01k,/star/data18,-1,ry2001,DbV1107,in,tpc_daq,tpc,rich,l3onl,Physics, Kalman,AlignSectors,Cdst,tags,Tree,evout,ExB,OBmap,OClock,OPr13,OTwist
    The option-mask is 5, that is 1+4. 1 is delayed IO as before, +4 is ton run the code in optimised mode
    The libVersion is SL01k so the code will run from the SL01k library version
    The destination is here a disk path /star/data18. In this job, the outputdir where specified as HPSS file (i.e. outputstreamtype were HPSS) so the path specified by outputdir will be substituted. For example, an HPSS path such as /home/starreco/reco/productionCentral/ReversedHalfField/P01gk/2001/308 will lead to store the file on local disk as wel in the (in our example) /star/data18/reco/productionCentral/ReversedHalfField/P01gk/2001/308
    number-of-events is -1 ; as in the preceeding example, all available events will be processed. There several chain-options chosen here. All will be passed as-is to bfc.C.
     
  • 5,SL01k,.,-1,p2001
    Comparing to the preceeding example, there are 2 changes :
    The destination path is chosen as being a local path .. When this jobs ends, storage to HPSS will be done but there will be NO copy of the outputstream anywhere.
    In addition, a reduced chain p2001 is chosen.
  • 25,dev,/star/data05,-1,+,2,StMuDstMaker.C,0,1,1000,-
    The option-mask is set to 25 which is 16+8+1 i.e. both standard input and output will be compressed and all IO to those files will be made in delayed mode. Then, the final disk destination is chosen to be /star/data05 .
    The mode used is "+" indicating a periodicity mode. Periodicity applies to the input you have selected for the job description and is specified on the next argument. However, the script to be run is the next argument and in this case, StMuDstMaker.C . This script will be called with arguments 0,1,1000 and the last argument "-" will be replaced by each input file modulo, 2 in our case. This means that only one inut every two will be passed to the selected script and replace the the last argument "-". This mode may be used if your script requires only one file as input but requires the presence of other files (this is common in STAR as the IOMaker assumes and checks for the presence of other files based on the chain options used and the extensions).

  • 25,dev,/star/data05/,2001-3000,@,bfc.C,GenericOutName.root,dAuMDC,tofsim,beamLine
    The first part of the options is as in the previous example.
    Inspecting the options further, we use the special flag "@" which is documentedin the above table as being the alternate script mode with fixed generic output filename. However, the next argument is bfc.C (still) in this exmaple (nothing prevents us from running bfc.C as alternate script). The next argument is GenericOutName.root. This argument will be passed to bfc.C as the last argument. The chain is befing dAuMDC,tofsim,beamLine used otherwise. In other words, what will be run is
             bfc.C(2001,3000,"dAuMDC,tofsim,beamLine",$INPUTFILE,"GenericOutName.root")
    This could be used to (for example), run over a large input file and split the reconstructions in sequence of events. In our example, we run on events 2001 to 3000 and select the output filename (no automated generation of output filename).

Notes:

  • Delayed IO mode should be always used. This avoid useless NFS traffic during running. In this mode, both log and error file appears created in stdoutdir and stderrdir directories respectively. However, the real content will be visible only after a job ends. Before this happens, the job is NFS problems insensitive (see above for more information).
  • If the output streams are of the UNIX type, a local directory syntax should be used for the same reason of avoiding NFS traffic during run time.
  • The copy of a file to its final destination only occurs at the end of a job. Several things here one needs to know : first, the directory tree is automatically created ; there is NO NEED to pre-create the directory tree. Second, if an NFS problem occurs before the job ends, bfcca will not be able to see the directory of the file final destination. bfcca will retry the copy every 2 minutes for a total of 24 hours after which, it will give up and display an Error message.
  • In case of a copy to a 100% full disk, bfcca will currently fail to copy the file and WILL NOT retry. Later, we will modify the script to include a detection of this condition.
  • Compression is done using gzip. Currently, only compression of the standard output is supported. This will be extended to the standard error as well later. Note that if the gzip does not exists on the machine where the job is running, bfcca will revert to un-compressed IO mode.

Setting up an alternate DB server or loadBalancer

In principle, the presence of $HOME/dbServers.xml would make our software pick this file as database configuration file. However, on the CRS nodes, the $HOME directory of the starreco is not its regular home-directory. bfcca will however respect this mechanism by using the STDB_SERVERS environment variable and setting it to the appropriate value.

In addition, note that bfcca if provided with an input file of type UNIX with a name matching /Loadbalancer/, this file will be ignored to first order except that the environment variable DB_SERVER_LOCAL_CONFIG will be over-written with the file name as value. For more information on this variable, consult Configuration File. When this mechanism is enabled, the use of $HOME/dbServers.xml as a possibility is by-passed.

Example:

inputstreamtype[1]=UNIX 
inputdir[1]=/afs/rhic.bnl.gov/star/packages/conf 
inputfile[1]=dbLoadBalancerLocalConfig_nightly.xml

This example would indicate to bfcca to ignore this input to first order (it will not be passed to the executable) and consider the file dbLoadBalancerLocalConfig_nightly.xml located in /afs/rhic.bnl.gov/star/packages/conf as an alternate configuration file for running.

<br />

Calibration pass or calibration directory usage

In addition, bfcca has the ability of running a calibration pass before a run pass or create a calibration directory on the farm local disk (or via a soft-link to an NFS area). Note that a single calibration pass alone may be accomplished by using the regular options and the appropriate chain-option(s). The mechanism is described as follow :

If one of the inputdir of type UNIX contains the string StarDb, this input is ignored and the inputfile is parsed. Several cases are then possible, in all cases, the file specified by inputfile MUST exists (remember that a staging is done for all input streams and, in the case of UNIX staging, the file presence is required).

  • Any value : the inputfile is ignored and a directory ./StarDb is created as a soft-link to inputdir in the local directory where the job will be running. Therefore, the full chain (the only yhing which will be running in this case) will be using the calibration files located in $inputdir/StarDb if any.
  • A value prefixed by Pre (example : PreTpcT0)
    The content of the inputfile is read. A calibration pass will run PRIOR from running the chain. This pre-pass will use the chain as EXACTLY specified in the inputfile. The later specified chain-option(s) specified by executableargs will be used for the regular chain. A local directory ./StarDb will also be created using a soft-link.
  • A value prefixed by One (example : OneSpaceCharge)
    This is almost the same than the previous case but the ./StarDb directory will be created locally to the node where the job will run (no soft-link involved here). The result of the calibration pass will be lost as the job description require that the file to be saved are known by name at submission time. This is typically used for a pass requiring a seed.
  • A value prefixed by Opt (example : OptTpcT0)
    As before, the content of inputfile will be used for the calibration options AND a local directory ./StarDb will be created. However, the chain-option(s) will be completely ignored and ONLY a calibration pass will be run.
    This mode can be replaced by a single dedicated pass using the appropriate chain-option(s) and ignoring the output streams.

Example:

inputstreamtype[1]=UNIX
inputdir[1]=/star/data13/reco/StarDb
inputfile[1]=PreTpcT0

In this example, inputfile is pre-fixed by Pre so a pre-calibration pass is requested. A local directory ./StarDb will be created and remain for the duration of the run (calibration and real-chain execution), the content of the file /star/data13/reco/StarDb/PreTpcT0 used as the calibration options. In the above example, the content was simply in TpcT0 RY2001 so the calibration pass would use those options.

Note:

In the above example, the CRS node needs to see the file /star/data13/reco/StarDb/PreTpcT0 for the staging to be a success. The job may fail in case of NFS problems. This IS NOT a misfeature but an advantage since it would be hazardous to have the job move along in case of NFS problems (jobs would inconsistently be running a calibration pass or not).

Displayed information and messages

bfcca displays some informative messages about the job before and after a chain is executed. The header of each log file will contain a tabulated summary information showing the chain options you have asked, the requested output streams name (initial and as it should be at the end) as well as information about the node, CPU speed, directory where the job is running, node it is running on etc ...

After the header, the messages are of the form

bfcca:: Severity: message

where Severity can be on of Info, Warning or Error. Messages related to copying files are for example of Severity=Info, missing files (chain did not produce what was declared in the job description) are of the kind Severity=Warning while any failure (directory creation, file cleaning etc ...) are considered Error. Note that on Severity=Error are displayed in the error file (STDERR) while others are displayed in the log file (STDOUT).