Embedding Setup Off-site

Under:

Introduction

The purpose of this document is to describe step-by-step the setting up of embedding infrastructure on a remote site i.e. not at it's current home which is PDSF. It is based on the experience of setting up embedding at Birmingham's NP cluster (Bham). I will try to maintain a distinction between steps which are necessary in general and those which were specific to porting things to Bham. It should also be a useful guide for those wanting to run embedding at PDSF and needing to copy the relevant files into a suitable directory structure.

Pre-requisites

Before trying to set up embedding on a remote site you should have:
  • a working local installation of the STAR library in which you are interested (or be satisified with your AFS-based library performance).
  • a working mirror of the star database (or be satisfied with your connection to the BNL hosted db).
If these two things are working correctly you will be able to process a daq file with the usual bfc.C macro. Check that you can do this and do not proceed further if this is not the case as you will be wasting your time. You can find the correct bfc.C options to use with a particular daq file and software release combination here.

Collect scripts

The scripts are currently housed at PDSF in the 'starofl' account area. At the time of writing (and the time at which I set up embedding in Bham) they are not archived in CVS. The suggested way to collect them is to copy them into a directory in your own PDSF home account then tar and export it for installation on your local cluster. The top directory for embedding is /u/starofl/embedding . Under this directory there are several subdirectories of interest.
  • Those named after each production, e.g. P06ib which contain mixer macro and perl scripts
  • Common which contains further subdirectories lists and csh and a submission perl script
  • GSTAR which contains the kumac for running the simulation
Therefore you need to create a replica of this directory tree. From your home directory e.g. /u/user do
mkdir embedding
cd embedding
mkdir Common
mkdir Common/lists
mkdir Common/csh
mkdir GSTAR
mkdir P06ib
mkdir P06ib/setup

Now it needs populating with the relevant files. In the following /u/user/embedding as an example of your new embedding directory in your user home directory.

cd /u/user/embedding
cp /u/starofl/embedding/getVerticesFromTags_v4.C .
cp -R /u/starofl/embedding/P06ib/EmbeddingLib_v4_noFTPC/ P06ib/
cp /u/starofl/embedding/P06ib/Embedding_sge_noFTPC.pl P06ib/
cp /u/starofl/embedding/P06ib/bfcMixer_v4_noFTPC.C P06ib/
cp /u/starofl/embedding/P06ib/submit.starofl.pl P06ib/submit.user.pl
cp /u/starofl/embedding/P06ib/setup/Piminus_101_spectra.setup P06ib/setup/
cp /u/starofl/embedding/GSTAR/phasespace_P06ib_revfullfield.kumac GSTAR/
cp /u/starofl/embedding/GSTAR/phasespace_P06ib_fullfield.kumac GSTAR/
cp /u/starofl/embedding/Common/submit_sge.pl Common/


You now have all the files need to run embedding. There are further links to make but as you are going to export them to your own cluster you need to make the links afterwards.

Alternatively you can run embedding on PDSF from your home directory. There are a number of change to make first though because the various perl scripts have some paths relating to the starofl account inside them.

For those planning to export to a remote site you should tar and/or scp the data. I would recommend tar so that you can have the original package preserved in case something goes wrong. E.g.

tar -cvf embedding.tar embedding/
scp embedding.tar remoteuser@mycluster.blah.blah:/home/remoteuser

Obviously this step is unnecessary if you intend to run from your PDSF account although you may still want to create a tar file so that you can undo any changes which are wrong.

Login to your remote cluster and extract the archive. E.g
cd /home/remoteuser
tar -xvf embedding.tar

Script changes

The most obvious thing you will find are a number of places inside the perl scripts where the path or location for other scripts appears in the code. These must be changed accordingly.

P06ib/Embedding_sge_noFTPC.pl
  1. my $code_dir = "/home/starofl/embedding/P06ib";
    changes to e.g.
    my $code_dir = "/home/remoteuser/embedding/P06ib";
     
  2. use lib "/home/starofl/embedding/P06ib/EmbeddingLib_v4_noFTPC";
    changes to e.g.
    use lib "/home/remoteuser/embedding/P06ib/EmbeddingLib_v4_noFTPC";
  3. my $kumac_name = "/home/starofl/embedding/GSTAR/";
    changes to e.g.
    my $kumac_name = "/home/remoteuser/embedding/GSTAR/";
  4. $DaqDir =~ /starprod\/daq\/\d+\/(\w+)\/(\w+)/; my $daqField = $2;
    changes to e.g.
    $DaqDir =~ /star\/(\w+)\/daq\/\d+\/(\w+)\/(\w+)/; my $daqField = $3;
P06ib/EmbeddingLib_v4_noFTPC/Process_object.pm
  1. my $macro_dir = "/u/starofl/embedding";
    changes to e.g.
    my $macro_dir = "/home/lbarnby/embedding";
  2. my $header_string = "#! /usr/local/bin/tcsh\n".
    changes to e.g.
    my $header_string = "#! /bin/tcsh\n".

    This is because the location of tcsh was different and probably will be for you too.
Common/submit_sge.pl
  1.  $daqDir =~ /\/(\w+)\/starprod\/daq\/\d+\/(\w+)\/(\w+)/;
    changes to e.g.
    $daqDir =~ /\/star\/(\w+)\/daq\/\d+\/(\w+)\/(\w+)/;

    Change relates to parsing the name of the directory with daq files in to extract the 'data vault' and 'magnetic field' which form part of job name and are used by Embedding_sge_noFTPC.pl (This may not make much sense right now and needs the detailed docs on each component. It is actually just a way to pass a file list with the same basename as the job). In the original script the path to the data is something like /dante3/starprod/daq/2005/cuProductionMinBias/FullField whereas on Bham cluster it is /star/data1/daq/2005/cuProductionMinBias/FullField and thus the pattern match in perl has to change in order to extract the same information. If you have a choice then choose your directory names with care!
  2. print QSUB "\$\!/bin/sh\n";
    changes to e.g.
    print QSUB "#\!/bin/sh\n";

    Change relates to the line printing the job submission shell script that this perl script writes and submits. The first line had to be changed such that it can correctly be identified as a sh script. I am not sure how original can ever have worked?
  3. print QSUB "-M $opt{u} -m e -N $jobName -o $logDir -j y\n";
    changes to e.g.
    print QSUB "-M $opt{u} -m e -N $jobName -o $logDir -j y -cwd -V -q embedding.q\n";

    This line prints part of the job submission script where the options for the job are specified. In SGE the job options can be in the file and not just on the command line. The extra options for Bham relate to our SGE setup. The -q option provides the name of the queue to use, otherwise it uses the default which I did not want in this case. The other extra options are to make the environment and working diretory correct as they were not the default for us. This is very specific to each cluster. If your cluster does not have SGE then I imagine extensive changes to the part writing the job submission script would be necessary. The scripts use the ability of SGE to have job arrays of similar jobs so you would have to emulate that somehow.

No significant changes required for:
  • getVerticesFromTags_v4.C - none
  • GSTAR/phasespace_P06ib_fullfield.kumac, GSTAR/phasespace_P06ib_fullfield.kumac - actually there are changes but they only relate to redefining particle decay modes for (anti-)Ξ and (anti-)Ω to go 100% to charged modes of interest. This is only relevant for strangeness group
  • P06ib/bfcMixer_v4_noFTPC.C - checked carefully that chain3->SetFlags line actually sets the same flags since Andrew and I had to change the same flags e.g. add GeantOut option after I made orginal copy
  • P06ib/EmbeddingLib_v4_noFTPC/Chain_object.pm - none
  • P06ib/EmbeddingLib_v4_noFTPC/EmbeddingUtilities.pm - there are lines where you may have to add the run numbers of the daq files which you are using so that they are recognised as either full field or reversed full field. In this example (Cu+Cu embedding in P06ib) the lines begin
    my @cuFF
    and
    my @cuRFF
    . This is also something that Andrew and I both changed after I made the original copy.
  • P06ib/submit.user.pl - changes here relate to setup that you want to run and not to the cluster or directory you are using i.e. which setup file to use, what daq directories to use and any pattern match on the file names (usually for testing purposes to avoid filling the cluster with useless jobs) although you probably want to change the
    my $email = 
    line!
  • P06ib/setup/Piminus_101_spectra.setup - any changes here relate to the simulation parameters of the job that you want to do and not to the cluster or directory you are using

Create links

A number of links are required. For example in the /u/starofl/embedding/P06ib there are the following links:
  • daq_dir_2005_cuPMBFF -> /dante3/starprod/daq/2005/cuProductionMinBias/FullField
  • daq_dir_2005_cuPMBRFF -> /dante3/starprod/daq/2005/cuProductionMinBias/ReversedFullField
  • daq_dir_2005_cuPMBHTFF -> /eliza5/starprod/daq/2005/cucuProductionHT/FullField/
  • daq_dir_2005_cuPMBHTRFF -> /eliza5/starprod/daq/2005/cucuProductionHT/ReversedFullField
  • tags_dir_cu_2005 -> /dante3/starprod/tags/P06ib/2005
  • tags_dir_cuHT_2005 -> /eliza5/starprod/embedding/tags/P06ib
  • data -> /eliza12/starprod/embedding/data
  • lists ->../Common/lists
  • csh-> ../Common/csh
  • LOG-> ../Common/LOG
You will therefore need similar links to where you store your daq files (and associated tags files) and where you want the output data to go.

That is it! Some things will probably need to be adapted to your circumstances but it should give you a good idea of what to do

Author: Lee Barnby, University of Birmingham (using starembed account)


Modified: A. Rose, Lawrence Berkeley National Laboratory (using starembed account)