Embedding Documentation

Overview



The purpose of embedding is to provide physicists with a known; Monte Carlo tracks, where the kinematics and particle type is known exactly, are introduced into the realistic environment seen in the analysis - a real data event. How reconstruction preforms on these "standard candle" tracks provides a baseline which can be used to correct for acceptance and effciency effects in various analyses.

In STAR, embedding is achieved with a set of software tools - Monte Carlo simulation software, as well as the tools for real data event reconstruction - accessed through a suite of scripts. These documents attempt to provide an introduction to the scripts and their use, but the STAR collaborator is referred to the separate web pages maintained for Simulations and Reconstruction for deeper questions of the processes involved in those respective tasks.

Please note, the general procedure for embedding is:
  • The Physics Working Group Convenor submits an embedding request. The request will be assigned a request number and priority by the Analysis Coordinator, see this page for more information.
  • The Embedding Coordinator will coordinate the distribution of embedding requests. Users who wish to run their own embedding are encouraged to do so - but it must be verified with the Embedding Coordinator.
  • The Embedding Coordinator is responsible for the QA of all embedding requests. When the Embedding Coordinator has verified that the test sample produced is acceptable, full production is authorized.


Obsolete documentations

Embedding Production Setup

This page describes how to set up embedding production. This procedure needs to be followed for any set of daq files/production version that requires embedding. Since this typically involves hacking the reconstruction chain, it is not advised that the typical STAR PA attempt this step. Please coordinate with a local embedding coordinator, and the overal Embedding coordinator (Olga).
Note: The documentation here is very terse; it will be enriched as the documentation as a whole is iterated on. Patience is appreciated.

Get daq files from RCF.

Grab a set of daq files from RCF which cover the lifetime of the run, the luminosity range experienced, and the conditions for the production.

Rerun standard production but without corrections.

bfc.C macros are located under ~starofl/bfc. Edit the submit.[Production] script to point to the daq files loaded (as above).

Put tags files on disk.

The results of the previous jobs will be .tags.root files located on HPSS. Retrieve the files, set a pointer for the tags files in the Production-specific directory under ~starofl/embedding.


Now you're ready to start production.

Embedding Setup Off-site

Introduction

The purpose of this document is to describe step-by-step the setting up of embedding infrastructure on a remote site i.e. not at it's current home which is PDSF. It is based on the experience of setting up embedding at Birmingham's NP cluster (Bham). I will try to maintain a distinction between steps which are necessary in general and those which were specific to porting things to Bham. It should also be a useful guide for those wanting to run embedding at PDSF and needing to copy the relevant files into a suitable directory structure.

Pre-requisites

Before trying to set up embedding on a remote site you should have:
  • a working local installation of the STAR library in which you are interested (or be satisified with your AFS-based library performance).
  • a working mirror of the star database (or be satisfied with your connection to the BNL hosted db).
If these two things are working correctly you will be able to process a daq file with the usual bfc.C macro. Check that you can do this and do not proceed further if this is not the case as you will be wasting your time. You can find the correct bfc.C options to use with a particular daq file and software release combination here.

Collect scripts

The scripts are currently housed at PDSF in the 'starofl' account area. At the time of writing (and the time at which I set up embedding in Bham) they are not archived in CVS. The suggested way to collect them is to copy them into a directory in your own PDSF home account then tar and export it for installation on your local cluster. The top directory for embedding is /u/starofl/embedding . Under this directory there are several subdirectories of interest.
  • Those named after each production, e.g. P06ib which contain mixer macro and perl scripts
  • Common which contains further subdirectories lists and csh and a submission perl script
  • GSTAR which contains the kumac for running the simulation
Therefore you need to create a replica of this directory tree. From your home directory e.g. /u/user do
mkdir embedding
cd embedding
mkdir Common
mkdir Common/lists
mkdir Common/csh
mkdir GSTAR
mkdir P06ib
mkdir P06ib/setup

Now it needs populating with the relevant files. In the following /u/user/embedding as an example of your new embedding directory in your user home directory.

cd /u/user/embedding
cp /u/starofl/embedding/getVerticesFromTags_v4.C .
cp -R /u/starofl/embedding/P06ib/EmbeddingLib_v4_noFTPC/ P06ib/
cp /u/starofl/embedding/P06ib/Embedding_sge_noFTPC.pl P06ib/
cp /u/starofl/embedding/P06ib/bfcMixer_v4_noFTPC.C P06ib/
cp /u/starofl/embedding/P06ib/submit.starofl.pl P06ib/submit.user.pl
cp /u/starofl/embedding/P06ib/setup/Piminus_101_spectra.setup P06ib/setup/
cp /u/starofl/embedding/GSTAR/phasespace_P06ib_revfullfield.kumac GSTAR/
cp /u/starofl/embedding/GSTAR/phasespace_P06ib_fullfield.kumac GSTAR/
cp /u/starofl/embedding/Common/submit_sge.pl Common/


You now have all the files need to run embedding. There are further links to make but as you are going to export them to your own cluster you need to make the links afterwards.

Alternatively you can run embedding on PDSF from your home directory. There are a number of change to make first though because the various perl scripts have some paths relating to the starofl account inside them.

For those planning to export to a remote site you should tar and/or scp the data. I would recommend tar so that you can have the original package preserved in case something goes wrong. E.g.

tar -cvf embedding.tar embedding/
scp embedding.tar remoteuser@mycluster.blah.blah:/home/remoteuser

Obviously this step is unnecessary if you intend to run from your PDSF account although you may still want to create a tar file so that you can undo any changes which are wrong.

Login to your remote cluster and extract the archive. E.g
cd /home/remoteuser
tar -xvf embedding.tar

Script changes

The most obvious thing you will find are a number of places inside the perl scripts where the path or location for other scripts appears in the code. These must be changed accordingly.

P06ib/Embedding_sge_noFTPC.pl
  1. changes to e.g.
     
  2. changes to e.g.
  3. changes to e.g.
  4. changes to e.g.
P06ib/EmbeddingLib_v4_noFTPC/Process_object.pm
  1. changes to e.g.
  2. changes to e.g.

    This is because the location of tcsh was different and probably will be for you too.
Common/submit_sge.pl
  1. changes to e.g.

    Change relates to parsing the name of the directory with daq files in to extract the 'data vault' and 'magnetic field' which form part of job name and are used by Embedding_sge_noFTPC.pl (This may not make much sense right now and needs the detailed docs on each component. It is actually just a way to pass a file list with the same basename as the job). In the original script the path to the data is something like /dante3/starprod/daq/2005/cuProductionMinBias/FullField whereas on Bham cluster it is /star/data1/daq/2005/cuProductionMinBias/FullField and thus the pattern match in perl has to change in order to extract the same information. If you have a choice then choose your directory names with care!
  2. changes to e.g.

    Change relates to the line printing the job submission shell script that this perl script writes and submits. The first line had to be changed such that it can correctly be identified as a sh script. I am not sure how original can ever have worked?
  3. changes to e.g.

    This line prints part of the job submission script where the options for the job are specified. In SGE the job options can be in the file and not just on the command line. The extra options for Bham relate to our SGE setup. The -q option provides the name of the queue to use, otherwise it uses the default which I did not want in this case. The other extra options are to make the environment and working diretory correct as they were not the default for us. This is very specific to each cluster. If your cluster does not have SGE then I imagine extensive changes to the part writing the job submission script would be necessary. The scripts use the ability of SGE to have job arrays of similar jobs so you would have to emulate that somehow.

No significant changes required for:
  • getVerticesFromTags_v4.C - none
  • GSTAR/phasespace_P06ib_fullfield.kumac, GSTAR/phasespace_P06ib_fullfield.kumac - actually there are changes but they only relate to redefining particle decay modes for (anti-)Ξ and (anti-)Ω to go 100% to charged modes of interest. This is only relevant for strangeness group
  • P06ib/bfcMixer_v4_noFTPC.C - checked carefully that chain3->SetFlags line actually sets the same flags since Andrew and I had to change the same flags e.g. add GeantOut option after I made orginal copy
  • P06ib/EmbeddingLib_v4_noFTPC/Chain_object.pm - none
  • P06ib/EmbeddingLib_v4_noFTPC/EmbeddingUtilities.pm - there are lines where you may have to add the run numbers of the daq files which you are using so that they are recognised as either full field or reversed full field. In this example (Cu+Cu embedding in P06ib) the lines begin
    and
    . This is also something that Andrew and I both changed after I made the original copy.
  • P06ib/submit.user.pl - changes here relate to setup that you want to run and not to the cluster or directory you are using i.e. which setup file to use, what daq directories to use and any pattern match on the file names (usually for testing purposes to avoid filling the cluster with useless jobs) although you probably want to change the
    line!
  • P06ib/setup/Piminus_101_spectra.setup - any changes here relate to the simulation parameters of the job that you want to do and not to the cluster or directory you are using

Create links

A number of links are required. For example in the /u/starofl/embedding/P06ib there are the following links:
  • daq_dir_2005_cuPMBFF -> /dante3/starprod/daq/2005/cuProductionMinBias/FullField
  • daq_dir_2005_cuPMBRFF -> /dante3/starprod/daq/2005/cuProductionMinBias/ReversedFullField
  • daq_dir_2005_cuPMBHTFF -> /eliza5/starprod/daq/2005/cucuProductionHT/FullField/
  • daq_dir_2005_cuPMBHTRFF -> /eliza5/starprod/daq/2005/cucuProductionHT/ReversedFullField
  • tags_dir_cu_2005 -> /dante3/starprod/tags/P06ib/2005
  • tags_dir_cuHT_2005 -> /eliza5/starprod/embedding/tags/P06ib
  • data -> /eliza12/starprod/embedding/data
  • lists ->../Common/lists
  • csh-> ../Common/csh
  • LOG-> ../Common/LOG
You will therefore need similar links to where you store your daq files (and associated tags files) and where you want the output data to go.

That is it! Some things will probably need to be adapted to your circumstances but it should give you a good idea of what to do

Author: Lee Barnby, University of Birmingham (using starembed account)


Modified: A. Rose, Lawrence Berkeley National Laboratory (using starembed account)


Modified Birmingham Files

Upload of modified embedding infrastructure files used on Birmingham NP cluster for Cu+Cu for (anti-)Λ and K0S embedding request.

Production Management

1) Usually embedding jobs are run in "HPSS" mode so the files end up in HPSS (via FTP). To transfer them from HPSS to disk copy the perl script ~starofl/hjort/getEmbed.pl and modify as needed. This script does at least two things that are not possible with, e.g., a command line hsi command: it only gets the files needed (usually the .geant and .event files) and it changes the permissions after the transfers. Note that if you do the transfers shortly after running the jobs the files will probably still be on the HPSS disk cache and transfers will be much fast than getting the files from tapes.

2) To clean up old embedding files make your own copy of ~starofl/hjort/embedAge.pl and use as needed. Note that $accThresh determines the maximum access time in days of files that will not be deleted.

Running Embedding

This page describes how to run embedding jobs once the daq files and tags files are in place (see other page about embedding production setup).

Basics:

Embedding code is located in production specific directories: ~starofl/embedding/P0xxx. The basic job submission template is typically called submit.starofl.pl in that directory.
Jobs are usually run by user starofl but personal accounts with group starprod membership will work, too (but test first as the group starprod write permissions typically are not in place by default).
The script to submit a set of jobs is submit.[user].pl. The script should be modified to submit an embedding set from the the configuration file
~starofl/embedding/[Production]/setup/[Particle]_[set]_[ID].setup
where
[Particle] is the particle type submitted (Piminus for GEANTID=9, as set inside file)
[set] is the file set submitted (more on this later)
[ID] is the embedding request number


Test procedure:

The best way to test a particular job configuration is to run a single job in "DISK" mode (by selecting a specific daq file in your submission). In this mode all of the intermediate files, scripts, logs, etc., are saved on disk. The location will be under the "data" link in the working directory. You can then go and figure out which script failed, hack as necessary and and try to make things work...

Details: