Grid Infrastructure

This page will be used for general information about our grid Infrastructure, news, upgrade stories, patches to the software stack, network configuration and studies etc ... Some documents containing local information are however protected.


External links

CERTS & VOMS/VOMRS

CERTS

If you do NOT have a grid certificate yet or need to renew your certificate, you need to either request a certificate or request a renewal. Instructions are available as:

A few notes
  • Your Sponsor and point of contact should be "Jerome Lauret"  the STAR/VO representative (and not your supervisor name or otherwise)
  • Note that as a request for a CERT, being added to the STAR VO requires approval from the STAR/RA and the STAR/VO representative (the RA are aware of this - best chance for your request to be promptly approved is to have the proper "Sponsor")
  • It does not hurt to specify that you belong to STAR when the ticket is created
  • Please, indicate on the request for a CERTificate what is your expected use of Grid services (data transfer? rnning jobs? anything else?)
  • Requesting a using a CERT and using it binds you to the OSG Policy Agreement you have to accept during the request. Failure to comply or violations will lead to a revocation of your CERT validity (in STAR, you have to expect that your VO representative will make sure of the polity IS respected in full)
     
  • The big advantage of renewing a CERT rather than requesting a new one is that the CN will be preserved (so no need for gridmap change)
  • The STAR/VO does NOT acept CERT-ificates other than STAR related CERT-ificates that is, OSG DigiCert-Grid CERT obtained for STAR related work and purposes. A user owning a CERT from a different VO will not be granted membership in VOMS - request a new CERT uniquely associated to STAR related work.
  • STAR rule of thumb / convention - Additional user certificates mapped to generic accounts: the CN would indicate the CERT owner's name. The generic account would appear in parenthesis. An example: /CN=Lidia Didenko (starreco)
  • STAR rule of thumb / convention - Service certificates: The CN field shows the requestor of the certificate

VOMS and VOMRS

Having a CERT is the first step. You now need to be part of a Virtual Organization (VO).

STAR used VOMRS during PPDG time and switched to VOMS at OSG time to maintained its VO user's certificates.
Only VOMS is currentely maintained. A VO is used as a centralized repository of user based information so all sites on the grid could be updated on addition (or removal) of identifications. VOMS service and Web interface are maintained by the RACF.
 

Using your Grid CERT to sign or encrypt Emails

Apart from allowing you to access the Grid, an SSL Client Certificate is imported into the Web browser from which you requested your Grid certificate. This certificate could be used to digtially sign or encrypt Email. For the second, you will need te certificate from the correspondign partner in order to encrypt Emails. To make use of it, folow the below guidance.

    • Find in your browser certificate management interface an 'export' or 'backup' option. THis interface varies from browser to browser and from Email client to Email client. We have checked only in Thudenrbird as an Email client and inventoried possible location for browser-based tools.
      • Internet Explorer: "Tools -> Internet Options -> Content"
      • Netscape Communicator as a "Security" vutton on the top bar menu
      • Mozilla: "Edit -> Prefercences -> Privacy and Security -> Certificates"
      Thudenrbird: "Tools -> Options -> Privacy -> Securiry -> View Certificate"
    • The file usually end-up withe xtension .p12 or .pfx.
      ATTENTION: Although the backup/export process will ask you for a "backup password" (and further encrypt your CERT), please guard this file carefully. Store it OFF your computer or remove the file once you are done with this process.
  • After exporting your certificate from your Web browser, you will need to re-import it into your Mail client. Let's assume it is Thuderbird for simplicity.
  • FIRST:
    Verify you have the DOEGrids Certificate Authority already imported in your Mail client and/or upload them.
    Note that the DOEGrid Certificate Authority is a subordinate CA of the ESnet CA ; therefore the ESnet CA root certificate should also be present. To check this
    • Go to "Tools -> Options -> Privacy -> Security -> View Cretificate"
    • Click on the "Authorities" tab
      • You should see both "DOEGrids CA 1" and "ESnet Root CA 1" under an "Esnet" tree as illustrated in this first picture
        Thunderbird CERT Manager

      • Be certain the "DOEGrids CA 1" is allowed to allow mail users. To do this, select the cert, click Edit. A window as illustrated in the next picture should appear. Both This certificate can indentify Web sites and This certificate can identify mail users should be checked.
        Thuderbird CERT Manager, Edit CA
    • If you DO NOT SEE those certificate authorities, you will need to import them.
      • Do so by downloading the doe-certs.zip attached at the bottom of this document, unzip . Two files should be there
      • Load them using the "Tools -> Options -> Privacy -> Security -> View Cretificate -> Aurthorities -> Import" button.
      • A similar window as displayed above will appear and you will need to check box at least This certificate can identify mail users.
  • Now, import your certificate.
    • Use the "Tools -> Options -> Privacy -> Security -> View Cretificate -> Your Certificate" menu and click "Import"
    • A file browser will appear, select the file you have exported from your browser. It will ask you for a password. You will need to use the smae password you used during the export phase from your Web Browser.
    • Click OK
    • Your are set to go ...
Note: if it is the very first time you use Thuderbird security device manager, an additional password dialog will appear asking for a "New Password" for the security device. This is NOT your backup password. You will need to remember this password as Thudenrbird will ask you for it each time you will start Thudenrbird and use a password or CERT for the first time during a session.

Usage note:
  • If you want a remote partner to send you encrypted messages, you MUST send first a digitally signed Email so your certificate public part could be imported into his/her Email client Certificate Manager under "Other People's". When done for the first time, THuderbird will ask you to set a certificate as default certificate ; the interface and selection is straight forwardso we will not detail the steps ...
  • If you want to send an encrypted message to a remote partner, you MUST have his public part imported into your Email client and then select the "Encrypt This Message" option in the Security drop down menu of Thunderbird.
  • Whenever a certificate expires, DO NOT remove from you Certificate Manager. If so, you will no longer be able to read / decrypt old encrypted Emails.



OSG Issues

This page will anchor various OSG-related collaborative efforts.

SGE Job Manager patch

We should come on this page with a draft that we want to send to the VDT guys about the SGE Job Manager.
  • Missing environment variables definition
    • In the BEGIN section check if $SGE_ROOT, $SGE_CELL and the commands ($qsub, $qstat, etc) are defined properly
    • in the SUBMIT, POOL and CLEAR sections, locate the line
      $ENV{"SGE_ROOT"} = $SGE_ROOT;
      
      and add the line
      $ENV{"SGE_CELL"} = $SGE_CELL;
      
  • Bug finding the correct job id when clearing jobs
    • in the CLEAR section, locate the line
      system("$qdel $job_id > /dev/null 2 > /dev/null");
      and replace for the following block
      $ENV{"SGE_ROOT"} = $SGE_ROOT;
      $ENV{"SGE_CELL"} = $SGE_CELL;
      $job_id =~ /(.*)\|(.*)\|(.*)/;
      $job_id = $1;
      system("$qdel $job_id > /dev/null 2 > /dev/null");
  • SGE Job Manager modifies definitions of both the standard output and standard error file names by appending .real. This procedure fails when a user specifies /dev/null for either of those files. The problem happens twice - once starting at line 318
  •     #####
        # Where to write output and error?
        #
        if(($description->jobtype() eq "single") && ($description->count() > 1))
        {
          #####
          # It's a single job and we use job arrays
          #
          $sge_job_script->print("#\$ -o "
                                 . $description->stdout() . ".\$TASK_ID\n");
          $sge_job_script->print("#\$ -e "
                                 . $description->stderr() . ".\$TASK_ID\n");
        }
        else
        {
            # [dwm] Don't use real output paths; copy the output there later.
            #       Globus doesn't seem to handle streaming of the output
            #       properly and can result in the output being lost.
            # FIXME: We would prefer continuous streaming.  Try to determine
            #       precisely what's failing so that we can fix the problem.
            #       See Globus bug #1288.
          $sge_job_script->print("#\$ -o " . $description->stdout() . ".real\n");
          $sge_job_script->print("#\$ -e " . $description->stderr() . ".real\n");
        }
     
    
    and then again at line 659:
          if(($description->jobtype() eq "single") && ($description->count() > 1))
          #####
          # Jobtype is single and count>1. Therefore, we used job arrays. We
          # need to merge individual output/error files into one.
          #
          {
            # [dwm] Use append, not overwrite to work around file streaming issues.
            system ("$cat $job_out.* >> $job_out");
            system ("$cat $job_err.* >> $job_err");
          }
          else
          {
            # [dwm] We still need to append the job output to the GASS cache file.
            #       We can't let SGE do this directly because it appears to
            #       *overwrite* the file, not append to it -- which the Globus
            #       file streaming components don't seem to handle properly.
            #       So append the output manually now.
            system("$cat $job_out.real >> $job_out");
          }
    
  • The snipped of code above is also missing a statement for the standard error. At the end instead of:
  •         #       So append the output manually now.
            system("$cat $job_out.real >> $job_out");
          }
    
    it should read:
            #       So append the output manually now.
            system("$cat $job_out.real >> $job_out");
            system("$cat $job_err.real >> $job_err");
          }
    
  • Additionally, if deployed in a CHOS environment, the job manager should be modified with the following additions at line 567:
  •     $ENV{"SGE_ROOT"} = $SGE_ROOT;
        if ( -r "$ENV{HOME}/.chos" ){
          $chos=`cat $ENV{HOME}/.chos`;
          $chos=~s/\n.*//;
          $ENV{CHOS}=$chos;
        }
    

gridftp update for VDT 1.3.9 or VDT 1.3.10

To install the updated gridftp server that includes a fix for secondary group membership:

for VDT 1.3.9 (which is what I got with OSG 0.4.0) in the OSG/VDT directory, do:

pacman -get http://vdt.cs.wisc.edu/vdt_139_cache:Globus-Updates

This nominally makes your VDT installation 1.3.9c, though it didn't update my vdt-version.info file accordingly -- it still says 1.3.9b

for VDT 1.3.10, similar installation should work:

pacman -get http://vdt.cs.wisc.edu/vdt_1310_cache:Globus-Updates

STAR VO Privilege Configuration

This page gives the GUMS and vomss configuration information for OSG sites to allow access for the STAR VO.

VOMS entry for edg-mkgridmap.conf
group vomss://vo.racf.bnl.gov:8443/edg-voms-admin/star osg-star

Example GUMS config:

<!--- 9 STAR ---!>
<groupMapping name='star' accountingVo='star' accountingDesc='STAR'>
<userGroup className='gov.bnl.gums.VOMSGroup'
url='https://vo.racf.bnl.gov:8443/edg-voms-admin/star/services/VOMSAdmin'
persistenceFactory='mysql'
name='osg-star'
voGroup="/star"
sslCertfile='/etc/grid-security/hostcert.pem'
sslKey='/etc/grid-security/hostkey.pem' ignoreFQAN="true"/>
<accountMapping className='gov.bnl.gums.GroupAccountMapper'
groupName='osg-star' /> </groupMapping>

Note that in the examples above "osg-star" refers to the local UID/GID names and can be replaced with whatever meets your local site policies.
Also the paths shown for sslKey and sslCertfile should be replaced with the correct values on your system.

Site information

This page will provide information specific to the STAR Grid sites.

BNL

GK Infrastructure

Gatekeeper Infrastructure

This page was last updated on May 17, 2016.

The nodes for STAR's grid-related activities at BNL are as follows:

Color coding

  • Black: in production (please, do NOT modify without prior warning)
  • Green: machine was setup for testing particular component or setup
  • Red : status unknown
  • Blue: available for upgrade upon approval
Grid Machine Usage Notes Hardware Make and Model OS version,
default gcc version
Hardware  arrangement OSG base Condor
stargrid01 FROM BNL, submit grid jobs from this node   Dell PowerEdge 2950

dual quad-core Xeon E5440 (2.83 GHz/ 1.333 GHz FSB), 16 GB RAM
RHEL Client 5.11,

gcc 4.3.2

6 x 1TB SATA2:

1GB /boot (/dev/md0) is RAID 1 across all six drives

There are 3 RAID 1 arrays using pairs of disks (eg. /dev/sda2 and /dev/sdb2 are one array).  The various local mount points and swap space are logical volumes scattered across these RAIDed pairs.

There are 2.68 TB of unassigned space in the current configuration.

NIC: 2 x 1Gb/s (one in use for RACF IPMI/remote administration on private network)

OSG 3.2.25 Client software stack for job submission 8.2.8-1.4 (part of OSG install -- only for grid submission, not part of RACF condor)
stargrid02 file transfer (gridftp) server Attention: on stargrid02, the mappings *formerly* were all grid mappings (i.e. to VO group accounts: osgstar, engage, ligo, etc...)

On May 17, 2016, this was changed to map STAR VO users to individual user accounts (matching the behaviour of stargrid03 and stargrid04).
  This behavior may be changed back. (TBD)

Former STAR-BNL site gatekeeper
Dell PowerEdge 2950

dual quad-core Xeon E5440 (2.83 GHz/ 1.333 GHz FSB), 16 GB RAM
RHEL Client 5.11,

gcc 4.3.2
6 x 1TB SATA2: Configured the same as stargrid01 above

NIC 2 x 1Gb/s (one in use for RACF IPMI/remote administration on private network)
OSG CE 3.1.23
7.6.10 (RCF RPM), NON-FUNCTIONAL (non-working configuration)
stargrid03 file transfer (gridftp) server To transfer using STAR individual user mappings, please use this node or stargrid04 Dell PowerEdge 2950

dual quad-core Xeon E5440 (2.83 GHz/ 1.333 GHz FSB), 16 GB RAM
RHEL Client 5.11,

gcc 4.3.2
6 x 1TB SATA2: Configured the same as stargrid01 above

NIC 2 x 1Gb/s (one in use for RACF IPMI/remote administration on private network)
OSG CE 3.1.18 7.6.10 (RCF RPM), NON-FUNCTIONAL (non-working configuration)
stargrid04 file transfer (gridftp) server To transfer using STAR individual user mappings, please use this node or stargrid03 Dell PowerEdge 2950

dual quad-core Xeon E5440 (2.83 GHz/ 1.333 GHz FSB), 16 GB RAM
RHEL Client 5.11,

gcc 4.3.2
6 x 1TB SATA2: Configured the same as stargrid01 above

NIC 2 x 1Gb/s (one in use for RACF IPMI/remote administration on private network)
OSG CE 3.1.23 7.6.10 (RCF RPM), NON-FUNCTIONAL (non-working configuration)

 

 

stargrid0[234] are using the VDT-supplied gums client (version 1.2.16).
stargrid02 has a local hack in condor.pm to adjust the condor parameters for STAR users with local accounts.


All nodes have GLOBUS_TCP_PORT_RANGE=20000,30000 and matching firewall conduits for Globus and other dynamic grid service ports.

 

 

LBL

MIT

CMS Analysis Facility

MIT’s CMS Analysis Facility is a large Tier-2 computing center built for CMS user analyses. We’re looking into the viability of using it for STAR computing.

Initial Setup

First things first. I went to http://www2.lns.mit.edu/compserv/cms-acctappl.html and applied for a local account. The welcome message contained a link to the CMSAF User Guide found on this TWiki page.

AFS isn’t available on CMSAF, so I started a local tree at /osg/app/star/afs_rhic and began to copy over stuff. Here’s a list of what I copied so far (nodes are running SL 4.4):

CERNLIB
/afs/rhic.bnl.gov/asis/sl4/slc4_ia32_gcc345/cern

OPTSTAR
/afs/rhic.bnl.gov/i386_sl4/opt/star/sl44_gcc346

GROUP_DIR
/afs/rhic.bnl.gov/star/group

ROOT 5.12.00
/afs/rhic.bnl.gov/star/ROOT/5.12.00/root
/afs/rhic.bnl.gov/star/ROOT/5.12.00/.sl44_gcc346

SL07e (sl44_gcc346 only)
/afs/rhic.bnl.gov/star/packages/SL07e

I copied these precompiled libraries over instead of building them myself because of a tricky problem with the interactive nodes’ configuration. The main gateway node is a 64-bit machine, so regular attempts at compilation produce 64-bit libraries that we can’t use. CMSAF has a node reserved for 32-bit builds, but it’s running SL 3.0.5. We’re still working on a proper resolution of that problem. Perhaps we can force cons to do 32-bit compilations.

The environment scripts are working, although I had to add more hacks than I thought were necessary. I only changed the following files:

  1. ~/.login
  2. ~/.cshrc
  3. $GROUP_DIR/site_post_setup.csh

It doesn’t seem possible to change the default login shell (chsh and ypchsh both fail), so when you login you need to type “tcsh” to get a working STAR environment (after copying my .login and .cshrc to your home directory, of course).

Basic interactive tests look good, and I’ve got a SUMS configuration that will do local job submissions to the Condor system (that’s a topic for another post). DB calls use the MIT database mirror. I think that’s all for now.

STAR Scheduler Configuration

I deployed a private build of SUMS (roughly 1.8.10) on CMSAF and made the following changes to globalConfig.xml to get local job submission working:

In the Queue List

In the Policy List

Now for the Dispatcher

And finally, here's the site configuration block

Database Mirror

MIT has a local slave connected to the STAR master database server.  A dbServers.xml with the following content will allow you to connect to it:


<StDbServer>
<server> star1 </server>
<host> star1.lns.mit.edu </host>
<port> 3316 </port>
<socket> /tmp/mysql.3316.sock </socket>
</StDbServer>

For more information on selecting database mirrors please visit this page.  You can also view a heartbeat of all the STAR database slaves here.  Finally, if you're interested in setting up your own database slave, Michael DePhillips has put some preliminary instructions on the

Drupal page.  Contact Michael for more info.

Guidelines For MIT Tier2 Job Requests

In order to facilitate the submission of jobs, all requests for the Tier2 must contain the following information.  Note that, because we cannot maintain stardev on Tier2, all jobs must be run from a tagged release.  It is the users responsibility to ensure that the requested job runs from a tagged release, with any necessary updates from CVS made explicit.

 

1.  Tagged release of the STAR environment from which the job will be run, e.g. SL08a.

2.  Link to all custom macros and/or  kumacs.

3.  Link to pams/ and StRoot/ directories containing any custom code, including all necessary CVS updates of the tagged release.

5.  List of commands to be executed, i.e. the contents of the <job></job> in your submission XML.

 

One is also free to include a custom log4j.xml, but this is not necessary.

MIT Simulation Productions

 

Production Name
STAR Library
Species Subprocesses

PYTHIA Library

BFC
Geometry
Notes
mit0000
SL08a pp200 QCD 2->2 pythia6410PionFilter
"trs fss y2006c Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727"
y2006c CKIN(3) = 4, CKIN(4) = 5
mit0001
SL08a pp200 QCD 2->2 pythia6410PionFilter
"trs fss y2006c Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727"
y2006c CKIN(3) = 5, CKIN(4) = 7
mit0002 SL08a pp200 QCD 2->2 pythia6410PionFilter "trs fss y2006c Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006c CKIN(3) = 7, CKIN(4) = 9
mit0003 SL08a pp200 QCD 2->2 pythia6410PionFilter "trs fss y2006c Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006c CKIN(3) = 9, CKIN(4) = 11
mit0004 SL08a pp200 QCD 2->2 pythia6410PionFilter "trs fss y2006c Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006c CKIN(3) = 11, CKIN(4) = 15
mit0005 SL08a pp200 QCD 2->2 pythia6410PionFilter "trs fss y2006c Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006c CKIN(3) = 15, CKIN(4) = 25
mit0006 SL08a pp200 QCD 2->2 pythia6410PionFilter "trs fss y2006c Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006c CKIN(3) = 25, CKIN(4) = 35

 

Kumacs slightly modified to incorporate local pythia libraries from ppQCDprod.kumac and ppWprod.kumac provided by Jim Sowinski

Production Name
STAR Library
Species Subprocesses PYTHIA Library BFC Geometry Notes
mit0007 SL08a pp500 W pythia6_410 "trs -ssd upgr13  Idst IAna l0 tpcI fcf -ftpc Tree logger ITTF Sti StiRnd  -IstIT -SvtIt -NoSvtIt SvtCL,svtDb -SsdIt MakeEvent McEvent geant evout geantout IdTruth  bbcSim emcY2 EEfs bigbig -dstout fzin -MiniMcMk McEvOut clearmem -ctbMatchVtx VFPPV eemcDb beamLine" upgr13 CKIN(3)=10, Custom BFC, vertex(0.1,-0.2,-60), beamLine matched
mit0008 SL08a pp500 QCD 2->2 pythia6_410  "trs -ssd upgr13  Idst IAna l0 tpcI fcf -ftpc Tree logger ITTF Sti StiRnd  -IstIT -SvtIt -NoSvtIt SvtCL,svtDb -SsdIt MakeEvent McEvent geant evout geantout IdTruth  bbcSim emcY2 EEfs bigbig -dstout fzin -MiniMcMk McEvOut clearmem -ctbMatchVtx VFPPV eemcDb beamLine" upgr13

CKIN(3)=20, CKIN(4)=30, Custom BFC, vertex(0.1,-0.2,-60), beamLine matched

mit0009 SL08a pp500 W pythia6410FGTFilter  "trs -ssd upgr13  Idst IAna l0 tpcI fcf -ftpc Tree logger ITTF Sti StiRnd  -IstIT -SvtIt -NoSvtIt SvtCL,svtDb -SsdIt MakeEvent McEvent geant evout geantout IdTruth  bbcSim emcY2 EEfs bigbig -dstout fzin -MiniMcMk McEvOut clearmem -ctbMatchVtx VFPPV eemcDb beamLine" upgr13 CKIN(3)=10, Custom BFC, vertex(0.1,-0.2,-60), beamLine matched
mit00010 SL08a pp500 QCD 2->2 pythia6410FGTFilter  "trs -ssd upgr13  Idst IAna l0 tpcI fcf -ftpc Tree logger ITTF Sti StiRnd  -IstIT -SvtIt -NoSvtIt SvtCL,svtDb -SsdIt MakeEvent McEvent geant evout geantout IdTruth  bbcSim emcY2 EEfs bigbig -dstout fzin -MiniMcMk McEvOut clearmem -ctbMatchVtx VFPPV eemcDb beamLine" upgr13 CKIN(3)=20, CKIN(4)=30, Custom BFC, vertex(0.1,-0.2,-60), beamLine matched
mit0011
SL08a pp500 QCD 2->2 pythia6410FGTFilterV2 "trs -ssd upgr13  Idst IAna l0 tpcI fcf -ftpc Tree logger ITTF Sti StiRnd  -IstIT -SvtIt -NoSvtIt SvtCL,svtDb -SsdIt MakeEvent McEvent geant evout geantout IdTruth  bbcSim emcY2 EEfs bigbig -dstout fzin -MiniMcMk McEvOut clearmem -ctbMatchVtx VFPPV eemcDb beamLine" upgr13 CKIN(3)=5, CKIN(4)=10, Custom BFC, vertex(0.1,-0.2,-60), beamLine matched
mit0012 SL08a pp500 QCD 2->2 pythia6410FGTFilter  "trs -ssd upgr13  Idst IAna l0 tpcI fcf -ftpc Tree logger ITTF Sti StiRnd  -IstIT -SvtIt -NoSvtIt SvtCL,svtDb -SsdIt MakeEvent McEvent geant evout geantout IdTruth  bbcSim emcY2 EEfs bigbig -dstout fzin -MiniMcMk McEvOut clearmem -ctbMatchVtx VFPPV eemcDb beamLine" upgr13 CKIN(3)=10, Custom BFC, vertex(0.1,-0.2,-60), beamLine matched
mit0013 SL08a pp500 QCD 2->2 pythia6410FGTFilterV2 "trs -ssd upgr13  Idst IAna l0 tpcI fcf -ftpc Tree logger ITTF Sti StiRnd  -IstIT -SvtIt -NoSvtIt SvtCL,svtDb -SsdIt MakeEvent McEvent geant evout geantout IdTruth  bbcSim emcY2 EEfs bigbig -dstout fzin -MiniMcMk McEvOut clearmem -ctbMatchVtx VFPPV eemcDb beamLine" upgr13 CKIN(3)=15, CKIN(4)=20, Custom BFC, vertex(0.1,-0.2,-60), beamLine matched
mit0014 SL08a pp500 QCD 2->2 pythia6410FGTFilterV2 "trs -ssd upgr13  Idst IAna l0 tpcI fcf -ftpc Tree logger ITTF Sti StiRnd  -IstIT -SvtIt -NoSvtIt SvtCL,svtDb -SsdIt MakeEvent McEvent geant evout geantout IdTruth  bbcSim emcY2 EEfs bigbig -dstout fzin -MiniMcMk McEvOut clearmem -ctbMatchVtx VFPPV eemcDb beamLine" upgr13 CKIN(3)=20, CKIN(4)=30, Custom BFC, vertex(0.1,-0.2,-60), beamLine matched
mit0015 SL08a pp500 QCD 2->2 pythia6410FGTFilterV2 "trs -ssd upgr13  Idst IAna l0 tpcI fcf -ftpc Tree logger ITTF Sti StiRnd  -IstIT -SvtIt -NoSvtIt SvtCL,svtDb -SsdIt MakeEvent McEvent geant evout geantout IdTruth  bbcSim emcY2 EEfs bigbig -dstout fzin -MiniMcMk McEvOut clearmem -ctbMatchVtx VFPPV eemcDb beamLine" upgr13 CKIN(3)=30, CKIN(4)=50, Custom BFC, vertex(0.1,-0.2,-60), beamLine matched
mit0016 SL08a pp500 QCD 2->2 pythia6410FGTFilterV2 "trs -ssd upgr13  Idst IAna l0 tpcI fcf -ftpc Tree logger ITTF Sti StiRnd  -IstIT -SvtIt -NoSvtIt SvtCL,svtDb -SsdIt MakeEvent McEvent geant evout geantout IdTruth  bbcSim emcY2 EEfs bigbig -dstout fzin -MiniMcMk McEvOut clearmem -ctbMatchVtx VFPPV eemcDb beamLine" upgr13 CKIN(3)=50, Custom BFC, vertex(0.1,-0.2,-60), beamLine matched

 

 

The seed for each file is given by 10000 * (Production Number) + (File Number). *The version of SL08c used is not the final version at RCF due to an unexpected update.

Production Name STAR Library Species Subprocess PYTHIA Library BFC Geometry Notessuffix
mit0019 SL08c pp200 Prompt Photon p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=2, CKIN(4)=3, StGammaFilterMaker
mit0020 SL08c pp200 Prompt Photon p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=3, CKIN(4)=4, StGammaFilterMaker
mit0021 SL08c pp200 Prompt Photon p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=4, CKIN(4)=6, StGammaFilterMaker
mit0022 SL08c pp200 Prompt Photon p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=6, CKIN(4)=9, StGammaFilterMaker
mit0023 SL08c pp200 Prompt Photon p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=9, CKIN(4)=15, StGammaFilterMaker
mit0024 SL08c pp200 Prompt Photon p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=15, CKIN(4)=25, StGammaFilterMaker
mit0025 SL08c pp200 Prompt Photon p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=25, CKIN(4)=35, StGammaFilterMaker
mit0026 SL08c pp200 QCD p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=2, CKIN(4)=3, StGammaFilterMaker
mit0027 SL08c pp200 QCD p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=3, CKIN(4)=4, StGammaFilterMaker
mit0028 SL08c pp200 QCD p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=4, CKIN(4)=6, StGammaFilterMaker
mit0029 SL08c pp200 QCD p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=6, CKIN(4)=9, StGammaFilterMaker
mit0030 SL08c pp200 QCD p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=9, CKIN(4)=15, StGammaFilterMaker
mit0031 SL08c pp200 QCD p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=15, CKIN(4)=25, StGammaFilterMaker
mit0032 SL08c pp200 QCD p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=25, CKIN(4)=35, StGammaFilterMaker
mit0033 SL08c pp200 QCD p6410BemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=35, CKIN(4)=65, StGammaFilterMaker

 

 

The seed for each file is given by 10000 * (Production Number) + (File Number). *The version of SL08c used is not the final version at RCF due to an unexpected update.

Production Name STAR Library Species Subprocess PYTHIA Library BFC Geometry Notessuffix
mit0034 SL08c pp200 Prompt Photon p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=2, CKIN(4)=3, StGammaFilterMaker
mit0035 SL08c pp200 Prompt Photon p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=3, CKIN(4)=4, StGammaFilterMaker
mit0036 SL08c pp200 Prompt Photon p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=4, CKIN(4)=6, StGammaFilterMaker
mit0037 SL08c pp200 Prompt Photon p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=6, CKIN(4)=9, StGammaFilterMaker
mit0038 SL08c pp200 Prompt Photon p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=9, CKIN(4)=15, StGammaFilterMaker
mit0039 SL08c pp200 Prompt Photon p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=15, CKIN(4)=25, StGammaFilterMaker
mit0040 SL08c pp200 QCD p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=2, CKIN(4)=3, StGammaFilterMaker
mit0041 SL08c pp200 QCD p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=3, CKIN(4)=4, StGammaFilterMaker
mit0042 SL08c pp200 QCD p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=4, CKIN(4)=6, StGammaFilterMaker
mit0043 SL08c pp200 QCD p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=6, CKIN(4)=9, StGammaFilterMaker
mit0044 SL08c pp200 QCD p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=9, CKIN(4)=15, StGammaFilterMaker
mit0045 SL08c pp200 QCD p6410EemcGammaFilter "trs fss y2006g Idst IAna l0 tpcI fcf ftpc Tree logger ITTF Sti VFPPV bbcSim tofsim tags emcY2 EEfs evout -dstout IdTruth geantout big fzin MiniMcMk clearmem eemcDb beamLine sdt20050727" y2006g CKIN(3)=15, CKIN(4)=25, StGammaFilterMaker

 

STAR environment on OS X

This page is obsolete -- please see Mac port of STAR offline software for the current status

In order of decreasing importance:

  1. pams - still can't get too far here.  No idea how the whole gcc -> agetof -> g77 works to compile "Mortran".  I know VMC is the future and all that, but I think we really do need pams in order to have a useful STAR cluster.
  2. dynamic library paths - specifying a relative object pathname to g++ -o means that the created dylib always looks for itself in the current directory on OS X.  In other words, the repository is useless.  Need to figure out how to tell cons to send the absolute path when linking.  Executables work just fine; it's only the .dylibs that have this problem.
  3. starsim - crashing on startup (!!!!! ZFATAL called from MZIOCH) Hopefully this is related to pams problems, although I do remember having some trouble linking.
  4. root4star - StarRoot,StUtilities,StarClassLibrary,St_base do not load automatically as I thought they were supposed to.  How do we do this at BNL?
  5. QtRoot - has it's own build system that didn't work out of the box for me.  Disabled StEventDisplayMaker and St_geom_Maker until I figure this out.

Contents of $OPTSTAR

I went through the list of required packages in /afs/rhic.bnl.gov/star/common/AAAREADME and figured out which ones were installed by default in an Intel OS X 10.4.8 client.  Here's what I found:

  • perl 5.8.6:  /usr/bin/perl (slightly newer than requested 5.8.4)
  • make 3.8.0:  /usr/bin/make -> gnumake
  • tar (??):  /usr/bin/tar
  • flex 2.5.4:  /usr/bin/flex
  • libXpm 4.11:  /usr/X11R6/lib/libXpm.dylib
  • libpng:  not found
  • mysql:  not found
  • gcc 4.0.1: /usr/bin/gcc -> gcc-4.0 (yeah, I know.  Apple does not support gcc 3.x in 10.4 for Intel!  We can do gcc_select to go back to 3.3 on ppc though.)
  • dejagnu:  not found
  • gdb 6.3.50:  /usr/bin/gdb (instead of 5.2)
  • texinfo:  not found
  • emacs 21.2.1:  /usr/bin/emacs (instead of 20.7)
  • findutils:  not found
  • fileutils:  not found
  • cvs 1.11:  /usr/bin/cvs
  • grep 2.5.1:  /bin/grep (instead of 2.5.1a)
  • m4 1.4.2:  /usr/bin/m4 (instead of 1.4.1)
  • autoconf 2.59:  /usr/bin/autoconf (2.53)
  • automake 1.6.3:  /usr/bin/automake
  • libtool (??):  /usr/bin/libtool (1.5.8)

I was able to find nearly all of the missing packages in the unstable branch for Fink (Intel machine).  I wouldn't worry about the "unstable" moniker; as long as you don't do a blind update-all it's certainly possible to stick to a solid config, and there are several packages on the list that are only available in unstable (only because they haven't yet gotten the votes to move them over to stable).  I've gone ahead and installed some of the missing packages in a fresh Fink installation and will serve it up over NFS at /Volumes/star1.lns.mit.edu/STAR/opt/star/osx48_i386_gcc401 (with a power_macintosh_gcc401 to match, although a more consistent $STAR_HOST_SYS would probably have been osx48_ppc_gcc401).

Here's a summary table of the packages installed in $OPTSTAR for the two OS X architectures at MIT.  Note that many of these packages have additional dependencies, so the full list of installed packages on each system (attached at the bottom of the page) is actually much longer.

package version
Fortran compiler
gfortran 4.2 (i386), g77 3.4.3 (ppc)
libpng 1.2.12
mysql 5.0.16-1002 (5.0.27 will break!)
dejagnu skipped
texinfo 4.8
findutils 4.2.20
fileutils 5.96
qt-x11 3.3.7
slang 1.4.9
doxygen 1.4.6
lynx 2.8.5
ImageMagick 6.2.8
nedit 5.5
astyle 1.15.3 (ppc only)
unixodbc 2.2.11
myodbc not available (2.50.39, if we want it)
libxml 2.6.26


I also looked for required perlmods in Fink.  I stuck with the default Perl 5.86, so the modules that say e.g. pm588 required I did not install.  I found that some of the modules are already part of core.  If the older ones hosted by STAR are still needed, let me know.  Virtual package means that it came with the OS already:

perlmod version
Compress-Zlib virtual package
DateManip 5.42a
DBI 1.53
DBD-mysql 3.0008
Digest-MD5 core module
HTML-Parser virtual package
HTML-Tagset 3.10
libnet not available
libwww-perl 5.805
LWPng-alpha not available
MD5 not available
MIME-Base64 3.05
Proc-ProcessTable 0.39-cvs20040222-sf77
Statistics-Descriptive 2.6
Storable core module
Time-HiRes core module
URI virtual package
XML-NamespaceSupport 1.08
XML-SAX 0.14
XML-Simple 2.16


There were some additional perlmods that install_perlmods listed as "Linux only" but Fink offered to install:

perlmod version
GD 2.30
perlindex not available
Pod-Escapes 1.04
Pod-Simple 3.04
Tk 804.026
Tk-HistEntry not available
Tk-Pod not available


Questions:

  • what was with all those soft-links (/usr/bin/sed -> /bin/sed, etc.) that Jerome had me make?  Will they be needed on every machine running STAR environment (that's a problem), or just on the one he was compiling on?
  • is perl in /usr/bin sufficient or do we need to put it in $OPTSTAR as directed in AAAREADME?
  • what to do about mysql? Is 5.0 back-compatible, or do we only need development headers and shared libraries?

 

Building PYTHIA dylibs with gfortran

The default makePythia6.macosx won't work out of the box for 10.4, since it requires g77.  Here's what I did to get the libraries built for Pythia 5:
$ gfortran -c jetset74.f $ gfortran -c pythia5707.f $ echo 'void MAIN__() {}' &gt; main.c $ gcc -c main.c $ gcc -dynamiclib -flat_namespace -single_module -undefined dynamic_lookup -install_name $OPTSTAR/lib/libPythia.dylib -o libPythia.dylib *.o $ sudo cp libPythia.dylib $OPTSTAR/lib/. and for Pythia 6: $ export MACOSX_DEPLOYMENT_TARGET=10.4 $ gfortran -c pythia6319.f In file pythia6319.f:50551 IF (AAMAX.EQ.0D0) PAUSE 'SINGULAR MATRIX IN PYLDCM' 1 Warning: Obsolete: PAUSE statement at (1) $ gfortran -fno-second-underscore -c tpythia6_called_from_cc.F $ echo 'void MAIN__() {}' &gt; main.c $ gcc -c main.c $ gcc -c pythia6_common_address.c $ gcc -dynamiclib -flat_namespace -single_module -undefined dynamic_lookup -install_name $OPTSTAR/lib/libPythia6.dylib -o libPythia6.dylib main.o tpythia6_called_from_cc.o pythia6*.o $ ln -s libPythia6.dylib libPythia6.so $ sudo cp libPythia6.* $OPTSTAR/lib/.

CERNLIB notes

All the CERNLIB libraries are static and the binaries depend only on system libraries, so the whole installation should be portable.  For PowerPC I had a CERNLIB 2005 build left over from a different Fink installation, so I just copied those binaries and libraries to the new location and downloaded the headers from CERN.  Fink doesn't support CERNLIB on Intel Macs, so for this build I used Robert Hatcher's excellent shell script:

http://home.fnal.gov/~rhatcher/macosx/readme.html

Hatcher's binaries link against the gfortran dylib, so I made sure to build them with gfortran from $OPTSTAR.

CERNLIB 2005 doesn't include libshift.a, but STAR really wants to link against it.  Here's a hack from Robert Hatcher to build your own cat &gt; fakeshift.c &lt; eof int rshift_(int* in, int* ishft) { return *in &gt;&gt; *ishft; } int ishft_(int* in, int* ishft) { if (*ishft == 0) return *in; if (*ishft &gt; 0) return *in &lt;&lt; *ishft; else return *in &gt;&gt; *ishft; } EOF gcc -O -fPIC -c fakeshift.c fi g77 -fPIC -c getarg_stub.f ar cr libshift.a fakeshift.o eof

ROOT build notes

Following the instructions at http://www.star.bnl.gov/STAR/comp/root/building_root.html was basically fine.  Here was my configure command for rootdeb:
./configure macosx --build=debug --enable-qt --enable-table --enable-pythia6 --enable-pythia --with-pythia-libdir=$OPTSTAR/lib --with-pythia6-libdir=$OPTSTAR/lib --with-qt-incdir=$OPTSTAR/include/qt which resulted in the final list Enabled support for asimage, astiff, builtin_afterimage, builtin_freetype, builtin_pcre, builtin_zlib, cern, cintex, exceptions, krb5, ldap, mathcore, mysql, odbc, opengl, pch, pythia, pythia6, python, qt, qtgsi, reflex, shared, ssl, table, thread, winrtdebug, xml, xrootd. I did run into a few snags:

  • MakeRootDir.pl didn't find my /usr/X11R6/bin/lndir automatically (even though that was in my $PATH) so I had edit the script and do it manually.
  • Had to run MakeRootDir.pl twice to get root and rootdeb directory structures in place, editing the script in between.
  • CVS was a mess.  I had to drill down into each subdirectory that needed updating, and even then it puked out conflicts instead of patching the files, so I had to trash the originals first.  Also, I'm fairly sure that root5/qt/inc/TQtWidget.h should have been included in the v5-12-00f tag, since my first attempt at compiling failed without the HEAD version of that file.

 

Hacking the environment scripts

  • set rhflavor = "osx48_" in STAR_SYS to get the name I chose for $STAR_HOST_SYS
  • I installed Qt in $OPTSTAR, so group_env.csh fails to find it

Building STAR software

I'm working with a checked out copy of the STAR software and modifying codes when necessary if the fix is obvious.  So far I've got the following cons working: cons %QtRoot %StEventDisplayMaker %pams %St_dst_Maker %St_geom_Maker St_dst_Maker tries to subtract an int and a struct!  Pams is a crazy mess of VAX-style Fortran STRUCTURES, but we really need it in order to run starsim.  I haven't delved too deeply into the QtRoot-related stuff; I'm sure Valeri can help when the time comes.  Hopefully we can get these things fixed without too much delay.

Power PC notes

  • why does everything insist on linking with libshift?  It's not a part of CERNLIB 2005, so I used Hatcher's hack to get around it and stuck libshift.a in $OPTSTAR/lib
  • libnsl is not needed on OS X, so we don't link against it anymore
  • remove -dynamiclib and -single_module for executables
  • cfortran.h can't identify our Fortran compiler -- define it as f2c
  • asps/Simulation/starsim/deccc/fputools.c won't compile under power pc (contains assembly code!) -- skip it for now
  • g++ root4star brings out lots of linking issues; one killer seems to be that libpacklib from Fink is missing fzicv symbol.
    • one very hack solution:  install gfortran, use it to build CERNLIB with Hatcher script, replace libpacklib.a, copy libgcc.a and libgfortran.a from gcc 4.2.0 into $OPTSTAR/lib or other, then link against them explicitly
    • needed to -lstarsim to get gufile, srndmc symbols defined
  • <malloc.h> -- on Mac they decided to put this in /usr/include/malloc, so we add this to path in ConsDefs.pm
  • cons wanted to link starsim using gcc and statically include libstdc++; on Mac we'll let g++ do the work.  Also, -lstarsim seems to be included too early in the chain.  Need to talk to Jerome about proper way to fix this, but for now I can hack a fix.
  • PAMS -- ACK!

Problems requiring changes to codes:

  • struct mallinfo isn't available on OS X
    • for now we surround any mallinfo with #ifndef __APPLE__; Frank Laue says there may be a workaround
  • 'fabs' was not declared in this scope
    • add <cmath> in header
  • TCL.h from ROOT conflicts with system tcl.h because of case-insensitive FS
    • TCL.h renamed to TCernLib.h in newer ROOT versions (ROOT bug 19313)
    • copied TCL.h to TCernLib.h myself and added #ifdef __APPLE__ #include "TCernLib.h"
    • this problem will go away when we patch/upgrade ROOT
  • passing U_Int to StMatrix::inverse() when it wants a size_t
    • changed input to size_t (only affected StFtpcTrackingParams)
  • abs(float) is not legal
    • change to fabs(float) and #include <cmath>

Intel notes

Basic problem here is (im)maturity of gfortran.  Current Fink unstable version 4.2.0-20060617 still does not include some instrinsic symbols (lshift, lstat) that we expect to be there.  Newer versions do have these symbols, and as soon as Fink updates I'll give it another go.  I may try installing gcc 4.3 from source in the meantime, but it's not a high priority.  Note that Intel machines should be able to run the Power PC build in translated mode with some hacking of the paths (force $STAR_HOST_SYS = osx48_power_macintosh_gcc401).

Xgrid

Summary of Apple's Xgrid cluster software and the steps we've taken to get it up and running at MIT.

http://deltag5.lns.mit.edu/xgrid/

Xgrid jobmanager status report

  • xgrid.pm can submit and cancel jobs successfully, haven't tested "poll" since the server is running WS-GRAM.
  • Xgrid SEG module monitors jobs successfully.  Current version of Xgrid logs directly to /var/log/system.log (only readable by admin group), so there's a permissions issue to resolve there.  My understanding is that the SEG module can run with elevated permissions if needed, but at the moment I'm using ACLs to explicitly allow user "globus" to read the system.log.  Unfortunately the ACLs get reset when the logs are rotated nightly.
  • CVS is up-to-date, but I can't promise that all of the Globus packaging stuff actually works.  I ended up installing both Perl module and the C library into my Globus installation by hand.
  • Current test environment uses SimpleCA, but I've applied for a server certificate at pki1.doegrids.org as part of the STAR VO.

Important Outstanding Issues

  • streaming stdout/stderr and stagingOut files is a little tricky.  Xgrid requires an explicit call to "xgrid -job results", otherwise it  just keeps all job info in the controller DB.  I haven't yet figured out where to inject this system call in the WS-GRAM job life cycle, so I'm asking for help on gram-dev@globus.org.
  • Need to decide how to do authentication.  Xgrid offers two options on the extreme ends of the spectrum.  On the one hand we can use a common password for all users, and on the other hand we can use K5 tickets.  Submitting a job using WS-GRAM involves a roundtrip user account -> container account -> user account via sudo, and I don't know how to forward a TGT for the user account through all of that.  I looked around and saw a "pkinit" effort that promised to do passwordless generation of TGTs from grid certs, but it doesn't seem like it's quite ready for primetime.

USP

This is a copy of the web page that contains a log of the Sao Paulo grid activities. For the full documentaion, please go to http://stars.if.usp.br:8080/~suaide/grid/

Installation

In order to be fully integrated to the STAR GRID you need to have the following items installed and running (the order I present the items are the same order I installed them in the cluster). There are other software to install before full integration but this is the actual status of the integration.

Installing the batch system (SGE)

We decided to install the SGE because it is the same system used in PDSF (so it is scheduler compatible) and it is free. The SGE web site is here. You can donwload the latest version from their website.

Instructions to install SGE

  1. Download from the SGE web site
  2. gunzip and untar the file
  3. cd to the directory
In the installation directory there are two pdf files.  The sge-install.pdf contains instruction on how to install the system. The sge-admin.pdf contains instructions how to maintain the system and create batch queues. Our procedure to install the system was:
  1. In the batch system server (in our case, STAR1)

    1. Create the SGE_ROOT directory. In our case, mkdir /home/sge-root. This directory HAS to be available in all the exec nodes
    2. copy the entire content of the installation directory to the SGE_ROOT directory
    3. add the lines bellow to your /etc/services file
      sge_execd        19001/udp
      sge_qmaster     19000/tcp
      sge_qmaster     19000/udp
      sge_execd        19001/tcp
    4. cd to the SGE_ROOT directory
    5. Type ./install_qmaster
    6. follow the instructions in the screen. In our case, the answers to the questions were:
      1. Do you want to install Grid Engine under an user id other than >root< (y/n) >> n
      2. $SGE_ROOT = /home/sge-root
      3. Enter cell name >> star
      4. Do you want to select another qmaster spool directory (y/n) [n] >> n
      5. verify and set the file permissions of your distribution (y/n) [y] >> y
      6. Are all hosts of your cluster in a single DNS domain (y/n) [y] >> y
      7. Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> classic
      8. You can change at any time the group id range in your cluster configuration. Please enter a range >> 20000-21000
      9. The pathname of the spool directory of the execution hosts. Default: [/home/sge-root/star/spool] >> [ENTER]
      10. Please enter an email address in the form >user@foo.com<. Default: [none] >> [PUT YOUR EMAIL]
      11. Do you want to change the configuration parameters (y/n) [n] >> n
      12. We can install the startup script that will start qmaster/scheduler at machine boot (y/n) [y] >> y
      13. Adding Grid Engine hosts. Do you want to use a file which contains the list of hosts (y/n) [n] >> n
      14. Host(s): star1 star2 star3 star4 ...... (ADD ALL HOSTS THAT WILL BE CONTROLED BY THE BATCH SYSTEM)
      15. Do you want to add your shadow host(s) now? (y/n) [y] >> n
      16. Scheduler Tuning. Default configuration is [1] >> 1
      17. Proceed with the default answers until the end of the script
    7. You have installed the master system. To make sure the system will start at boot time type
      ln -s /etc/init.d/sgemaster /etc/rc3.d/S95sgemaster
      ln -s /etc/init.d/sgemaster /etc/rc5.d/S95sgemaster
  2. Install the execution nodes (including the server, if it will be a exec node). This needs to be done in ALL exec nodes

    1. add the lines bellow to your /etc/services file
      sge_execd        19001/udp
      sge_qmaster     19000/tcp
      sge_qmaster     19000/udp
      sge_execd        19001/tcp
    2. cd to your SGE_ROOT directory
    3. type ./install_execd
      1. Answer the question about the SGE_ROOT directory location
      2. Please enter cell name which you used for the qmaster. >> star
      3. Do you want to configure a local spool directory for this host (y/n) [n] >> n
      4. We can install the startup script that will start execd at machine boot (y/n) [y] >> y
      5. Do you want to add a default queue instance for this host (y/n) [y] >> n (WE WILL CREATE A QUEUE LATER)
      6. follow the default instructions until the end
    4. You have now installed the master system. To start the system at boot time. type
      ln -s /etc/init.d/sgeexecd /etc/rc3.d/S96sgeexecd
      ln -s /etc/init.d/sgeexecd /etc/rc5.d/S96sgeexecd
  3. Install a default queue to your batch system

    1. type qmon
      It opens a GUI window where you can configure all the batch system.
    2. Click in the buttom QUEUE CONTROL
    3. It opens another screen with the queues you have in your system
    4. Click on ADD
    5. Fill the instructions. See the file sge-admin.pdf for instructions. It is very simple.

Installing GANGLIA

Aditional information from STAR web site

You can download the ganglia packages from their web site. You need to install the following packages:
  • gmond - the monitoring system. Should be installed in ALL machines in the cluster
  • gmetad - the gathering information system. Should be installed in the machine that will collect the data (in our case, STAR1)
  • the web front end. This is nice to have but not essential. It creates a web page, like this one, with all the information in your cluster. You should have a web server running in the collector machine (STAR1) for this to work
  • rrdtool - this is a package that creates the plots in the web page. Necessary only if you have the web frontend.
To install Ganglia, proceed with the following
  1. In each machine in the cluster

    1. Install the gmond package (change the name to match the version you are installing)
      rpm -ivh ganglia-gmond-3.0.1-1.i386.rpm
    2. edit the /etc/gmond.conf file. The only change I made in this file was
      cluster {
        name = "STAR"
      }
    3. Type
      ln -s /etc/init.d/gmond /etc/rc5.d/S97gmond
      ln -s /etc/init.d/gmond /etc/rc3.d/S97gmond
      /etc/init.d/gmond stop
      /etc/init.d/gmond start
  2. In the collector machine (STAR1)

    1. Install the gmetad, web and rrdtool packages (change the name to match the version you are installing)
      rpm -ivh ganglia-gmetad-3.0.1-1.i386.rpm
      rpm -ivh ganglia-web-3.0.1-1.noarch.rpm
      rpm -ivh rrdtool-1.0.28-1.i386.rpm
    2. edit the /etc/gmetad.conf file. The only change I made in this file was
      data_source "STAR" 10 star1:8649 star2:8649 star3:8649 star4:8649 star5:8649
    3. Type
      ln -s /etc/init.d/gmetad /etc/rc5.d/S98gmetad
      ln -s /etc/init.d/gmetad /etc/rc3.d/S98gmetad
      /etc/init.d/gmetad stop
      /etc/init.d/gmetad start

MonaLISA

Aditional information from STAR web site

To install Monalisa in your system you need to download the files from their web site. After you gunzip and untar the file you need to perform the following steps:
  1. Create a monalisa user in your master computer and its home directory
  2. cd to the monalisa installation dir
  3. type ./install.sh
  4. Answer the following questions:
    1. Please specify an account for the MonALISA service [monalisa]: [ENTER]
    1. Where do you want MonaLisa installed ? [/home/monalisa/MonaLisa] : [ENTER]
    2. Path to the java home []: [enter the path name for your java distribution]
    3. Please specify the farm name [star1]: [star]
    4. Answer the next questions as you wish
  5. Make sure that Monalisa will run after reboot by typing:
    ln -s /etc/init.d/MLD /etc/rc5.d/S80MLD
    ln -s /etc/init.d/MLD /etc/rc3.d/S80MLD
  6. You need to edit the following files in the directory /home/monalisa/MonaLisa/Services
    1. ml.properties
      MonaLisa.ContactName=your name
      MonaLisa.ContactEmail=xxx@yyyy.yyy
      MonaLisa.LAT=-23.25
      MonaLisa.LONG=-47.19
      lia.Monitor.group=OSG, star (Note that we are being part of both OSG and STAR groups)
      lia.Monitor.useIPaddress=xxx.xxx.xxx.xxx (your IP)
      lia.Monitor.MIN_BIND_PORT=9000
      lia.Monitor.MAX_BIND_PORT=9010
  7. Need to tell MonaLisa that I am using SGE as a batch system. For this, edit the Service/CMD/site_env file and add
    SGE_LOCATION=/home/sge-root
    export SGE_LOCATION
    SGE_ROOT=/home/sge-root
    export SGE_ROOT
It is important to make sure these ports are not blocked by your firewal, in case your system is behind one.

To start the MonaLisa service just type
/etc/init.d/MLD start

Requesting a GRID certificate

By the way, you will have to request (for Grid usage) a user certificate. For instructions, click on the link http://www.star.bnl.gov/STAR/comp/Grid/Infrastructure/#CERT

A grid installation will require a "host" certificate. Jerome told me he never asked for one really ...
The certificate arrived three days after I requested it (with some help from Jerome). I them followed
the instructions that came with the email to validade and export the certificate.

Installing OSG

I think this is the last step to be fully GRID integrated. I have not used the certificate I got up to now. Lets see. To install the OSG package I followed the instructions in the following web page

http://osg.ivdgl.org/twiki/bin/view/Documentation/OsgCEInstallGuide


The basic steps were
  1. Make sure pacman is installed. For this I had to update python to a version above 2.3. Pacman is a package management system. It can be downloaded from here
  2. create a directory at /home/grid. This is where I installed the grid stuff. Thid directory needs to be visible in all the cluster machines
  3. I typed
    export VDT_LOCATION=/home/grid
    cd $VDT_LOCATION
    pacman -get OSG:ce
    I  just followed the log and answered the questions.
The entire installation process took about 20 minutes or so but I imagine it depends on the network connection speed.

After this installation was done I typed source setup.sh to complete the installation. No messages in the screen...

Because our batch system is SGE, we need to install extra packages, as stated in the OSG documentation page. I typed:
pacman -get http://www.cs.wisc.edu/vdt/vdt_136_cache:Globus-SGE-Setup
and these extra packages were installed in a few seconds.

I just followed the instructions in the OSG installation guide and everything went fine. One important thing is related to firewall setup. If you have a firewall running with MASQUERADE, in which your private network is not accessible from the outside world, and your gatekeeper is not the firewall machine, remember to open the necessary ports (above 1024) and redirect the ports number 2119, 2811 and 2812 to your gatekeeper machine. The command depends on your firewall program. If using iptables, just add the following rule to your filter tables:
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 2119 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 2119 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 2135 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 2135 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 2136 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 2136 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 2811 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 2811 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 2812 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 2812 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 2912 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 2912 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 7512 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 7512 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 8443 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 8443 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 19000 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 19000 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 19001 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 19001 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p udp -d $GLOBALIP --dport 20000:65000 -j DNAT --to $STAR1
$filter -t nat -A PREROUTING -p tcp -d $GLOBALIP --dport 20000:65000 -j DNAT --to $STAR1

where $GLOBALIP is the external IP of your firewall and $STAR1 is the IP of the machine running the GRID stuff.

I also had to modify the files /home/grid/setup.csh and setup.sh to fix the HOSTNAME and port range. I added, in each file:
setup.csh
setenv GLOBUS_TCP_PORT_RANGE "60000 65000"
setenv GLOBUS_HOSTNAME="stars.if.usp.br"
setup.sh
export GLOBUS_TCP_PORT_RANGE="60000 65000"
export GLOBUS_HOSTNAME="stars.if.usp.br"
This assures that the port range opened in the firewall will correspond to those used in the GRID environment. Also, because I run the firewall in masquerade mode, I had to set the proper hostname, otherwise it will pick the machine name, and I do not want that to happen.

GridCat and making things to work...

It is very interesting to add your grid node to GridCat. It is a map, just like MonaLisa but it performs periodical tests to your gatekeeper, making it easier to find out problems (and, if you got to this point, there should be a few of them)

To add your gatekeeper to GridCat,  go to http://osg.ivdgl.org/twiki/bin/view/Integration/GridCat

You will have to fill a form, following the instructions in  the following link:

http://osg.ivdgl.org/twiki/bin/view/Documentation/OsgCEInstallGuide#OSG_Registration

If everything goes right, when your application is aproved you will show up in the GridGat Map, located  http://osg-cat.grid.iu.edu:8080

well, this is were debuggins starts. Every 2-3 hours the GridCat tests their gatekeepers and assign a status light for each one, based on tests results. The tests are basically:
  • Authentication test
  • Hello world test
  • Batch submition (depends on your batch system)
    • submit a job
    • query the status of the job
    • canceling the job
  • file transfer (gridFtp)
This is were I spent my last few days trying to resolve the issues. Thanks a lot for all the people in the STAR-GRID list that helped me a lot with suggestions. But I had to find out a lot of stuff... This is what google is made for.... The main issue is the fact that our cluster is behind a firewall configures qith masquerading.... It means that the internal IP's of the machines (including the gatekeeper) are not visible. All the machines have the same IP (the gateway IP) for the outside world.... I think I am the only one in the GRID with this kind of setup :)

How to turn authentication and hello world to green?

This is the easiest... Need to map the following certificates to your grid map (/etc/grid-security/grid-mapfile)
"/DC=org/DC=doegrids/OU=People/CN=Leigh Grundhoefer (GridCat) 693100" XXXX
"/DC=org/DC=doegrids/OU=People/CN=Bockjoo Kim 740786" XXXX
The username 'XXXX' is the local username in your cluster... After this certificates were added to my mapfile the first two tests turned green

How to turn the batch system test green

It seems that SGE is not the preferable batch system in the GRID... Too bad because it is really nice and SIMPLE. Because of this the OSG interface to OSG does not work right.... I hope the bugs are fixed in the next release but just to keep log of what I did (with a lot of hel) in case they forget to fix it :)
  • mis-ci-functions
    • This file, located at $VDT_LOCATION/MIS-CI/etc/misci/ is responsible for checking your system basically every 10 minutes and extract information about your cluster. It uses the batch system to grab information. Of course, it does not work with SGE. Replace the file with the version 0.2.7, located here. Please check if your version is newer than this one before replacing...
  • sge.pm
    • This file is located at $VDT_LOCATION/globus/lib/perl/Globus/GRAM/JobManager/
    • Please check the following
      • In the BEGIN section
        • if $SGE_ROOT, $SGE_CELL and the commands ($qsub, $qstat, etc) are defined properly
      • In the submit section
        • Locate the line
          • $ENV{"SGE_ROOT"} = $SGE_ROOT;
        • add the line
          • $ENV{"SGE_CELL"} = $SGE_CELL;
      • The same in the pool section
      • In the clear section
        • locate the line  system("$qdel $job_id >/dev/null 2>/dev/null");
        • replace for the following
          •     $ENV{"SGE_ROOT"} = $SGE_ROOT;
                $ENV{"SGE_CELL"} = $SGE_CELL;
                $job_id =~ /(.*)\|(.*)\|(.*)/;
                $job_id = $1;
                system("$qdel $job_id");
This will make your batch tests turn green. It means people can submit jobs, query, cancel, etc. I hope I did not miss anything in here...

Making the gridFTP to work

This was the most difficult part because of my firewall configuration and thanks google for making reseach in the web easier...

Before, please check if the services are listed in your /etc/services file
  globus-gatekeeper       2119/tcp        # Added by the VDT
gsiftp 2811/tcp # Added by the VDT
gsiftp2 2812/tcp # Added by the VDT
gsiftp 2811/udp # Added by the VDT
gsiftp2 2812/udp # Added by the VDT
If not, add them...

I started testing file transfer between gatekeepers by logging into another gatekeeper, getting my proxy (grid-proxy-init) and do a file transfer with the command:
globus-url-copy -dbg file:///star/u/suaide/gram_job_mgr_13594.log gsiftp://stars.if.usp.br/home/star/c
The -dbg mean debug is turned on... Everything goes fine until it starts transfering the data (STOR /home/star/c). It hangs and times out. Researching on the web, I found a bug report at

http://bugzilla.globus.org/globus/show_bug.cgi?id=1127

And a quote in the bottom of the page:

" ... The wuftp based gridftp server is not supported behind a firewall. The problem is in reporting the external IP address in the PASV response. You can see this by using the -dbg flag to globus-url-copy. You will see the the PASV response specifies your internal IP address.

The server should, however, work for clients using PORT. ..."

which means I am doommed... Researching more the web I found some solutions and what I did was:
  • replace file /etc/xinetd.d/gsiftp for this one
    service gsiftp
    {
         socket_type = stream
         protocol = tcp
         wait = no
         user = root
         instances = UNLIMITED
         cps = 400 10
         server = /auto/home/grid/vdt/sbin/vdt-run-gsiftp2.sh
         disable = no
    }
  •  restarted xinetd
  • modified the file /hom/grid/globus/etc/gridftp.conf to
    # Configuration for file the new (3.9.5) GridFTP Server
    inetd 1
    log_level ERROR,WARN,INFO,ALL
    log_single /auto/home/grid/globus/var/log/gridftp.log
    hostname "XXX.XXX.XXX.XXX"
  •  XXX.XXX.XXX.XXX is the IP of the gateway for the outside world
And this worked!!!!

Now all tests are geen and I am happy and tired!!! There are still a few issues left, basically in the cluster information query (number of CPU's, batch queues, etc) that are related to mis-ci-functions (I think) and I will have a look latter.

Another important thing, if you plan to have a cluster running jobs from outside and making file transfers with gsiftp it is necessary that the directory /etc/grid-security is available in all machines in the cluster, even if they are not gatekeepers. Also, the grid setup should be executed in all the nodes (/home/grid/setup.csh). If not, when a job start running in one of the nodes and it attempts to transfer the file with globus-url-copy it will fail. The solution I used was to have the directory grid-security in the /home/grid and make symbolic links in all the nodes.

WSU

Specification for the storage and cataloging of STAR virtual machine images

Specification for the storage and cataloging of STAR virtual machine images

 

About this document:

This documentation describes where and how STAR virtual hard disk images are stored. Through standardization we can allow images to be:

  • reusable by others
  • cataloged, so the total available inventory if images is known
  • locatable quickly
  • somewhat self describing

 

What is a virtual machine image:

An image is a virtual hard disk, basically a large file on your "real" hard disk, whose contents are presented to the virtual machines (VM) as if it were a complete hard disk. STAR has a repository of virtual machine images which have operating systems and the STAR software stack pre-installed. Businesses now provide large computing facilities with many nodes running virtual machines that can be rented by the hour. One can upload a virtual machine image and essentially rubber stamp as many nodes of a particular configuration as desired. So they are not exactly identical, for example you will be wanting different names for each node (rcas1,rcas2,rcas3,...), a special step called contextualization is used to slightly customize each image.

A common type of image format many people are familiar with is the .ISO file. For example one can dump a block device like a hard disk or optical drive to one of these files. (could be a little irrelevant)

Example:

dd if=/dev/cdrom of=/home/bobfox/myCDimage.iso

One can then mount the file as if it where any other hard disk. In most newer desktop Linux distributions in the default Gnome desktop one can just click on an iso file and the mounted drive icon will appear on your desktop. Most virtual machines (VMware, VirtualBox) support mounting of ISO files as if they where CD-ROMS attached to a real system.

What does STAR want to do with this technology:

The STAR software stack is not yet as easy to install as “just keep clicking next”. We can make most of these complexities go away by providing virtual machine images to tier-2(+) STAR sites with the software stack pre-installed and ready to run out of the box.

STAR's dreams of running on the grid have languished for a long time because of the non-homogeneous nature of GRID host sites. STAR can not recompile and certify a customized version of its software stack for each site even if the resources are free and available. However most of the differences between sites offering computing resource can be leveled out if they all run virtual machines. This means STAR can guaranty that its software will run on different sites and that the output file produced are as valid as if they had been run on the local BNL Tear 1 site via the usual stringent quality assurance processes STAR employs.

It all sounds too good to be true. That is because it is. There are many different virtual machine softwares all with different formats of virtual machine images.

Some common virtual machine softwares are:

  • VirtualBox
  • VMware
  • KVM
  • XEN

 

Virtual machine images:

These different virtual machine softwares require different image formats. Some can be easily converted, however most can not. For example Xen images don't contain any files in it /boot partition because it uses its own built in kernel. On the other hand VirtualBox uses the original operating system kernel. So converting from XEN to VirtualBox images is not really possible. Many packages will offer to convert the file system structure, but that doesn't mean the image will be able to boot.

The type of image you need is not up to you. It is determined by the virtual machine software of the host site. For example a XEN virtual machine will require a XEN image. Support for image formats other then the native format of the VM is very limited and in most cases non-existent and or unreliable.

Key parameters of images are:

  • Virtual Machine Format
  • Storage space

 

Why is storage space on the list

When you create an image file, its size needs to be specified, which represents a fixed geometry of the virtual disk. It is not possible to change the size of the virtual hard disk later. If you have a fixed-size image of e.g. 10 GB, an image file of roughly the same size will be created on your host system. There are also dynamically expanding images these will initially be small and not occupy much space for unused virtual disk sectors, but the image file will grow every time a disk sector is written to for the first time to some max size which can not be made bigger once the full size is reached.

 

When you run your job at some remote site you will need to write the data somewhere. There are mapped network drives and such, but the most convenient space to write is in the image it's self. So the total data of the node can not be bigger then the max size of the image. We can express it like this:

 

( [OS] + [swap space] + [STAR software] + [your data] ) < ( [Image max size] )

 

Now that we have gotten the parameters out of the way that are specific to virtual machines there are some parameters specific to the software installed on the image.

 

Key parameters of the software installed on the image are:

  • STAR libraries installed
  • Operating System
  • Kernel Version
  • Instruction set architecture
  • Instruction set architecture word size (32bit / 64bit)

 

One image may hold many different library versions, all of the other parameters may only have one value at a time.  The data08 volume is devoted to grid work. It is the location of STARs image repository.

  1. Note: Even though more then one star software library version may reside in an image this is hard to do because of the size of the image. So we are assuming only one library per image.
  2. Note:In the case that an image is more then one file an additional directory will be needed, the directory will take the file naming convention.

The path to the repository and scheme used to derive the name of the image is below:

 

/star/data08/OSG/APP/vm/[VM]/[Operating System]_[Instruction set architecture]_[Instruction set architecture word size]_[STAR lib version]_[maxsize]_[addition detail].[extension]

Place Holder

Definitions

[VM]

The name of the VM software (xen, kvm, virtualBox, ....)

[Operating System]

The Operating System installed (sl4, sl5.3, ubuntu9.10, fedora12, centOS5, ...)

[Instruction set architecture]

x86, SPARC, ARM, Alpha, PowerPC, AVR

[Instruction set architecture word size]

8, 16, 32, 64, 128, 512

[STAR lib version]

The STAR library version (example: sl05a)

[maxsize]

The maxsize to which the image will grow (example: 2GB)

[addition detail]

Any additional detail we may want to add

[extension]

The file extension

 

 

Examples:

/star/data08/OSG/APP/vm/xen/sl5.3_x86_32_sl05c_8GB_ec2.img

/star/data08/OSG/APP/vm/virtualBox/ubuntu9.10_x86_32_sl05c_10GB_ec2.img

In addition there will be a text file with the .checksum extension holding an MD5 checksum hash of the image.

Example of making the hash:

[rcas6016] xen/> ls

sl4_x86_32_sl08e_8GB_ec2.img
sl4_x86_32_sl08e_8GB_ec2.txt

[rcas6016] xen/> md5sum sl4_x86_32_sl08e_8GB_ec2.img > sl4_x86_32_sl08e_8GB_ec2.checksum

/star/data08/OSG/APP/vm/virtualBox/ubuntu9.10_x8

In addition there will be a text file with the .txt extension giving more detail about the contents of the image.

Examples:

/star/data08/OSG/APP/vm/xen/sl5.3_x86_32_sl05c_8GB_ec2.txt

/star/data08/OSG/APP/vm/virtualBox/ubuntu9.10_x86_32_sl05c_10GB_ec2.txt

 

Recommendations for security and standardization (but not yet implemented):

 

Users will need to be able to login without editing the /etc/shadow file.

 

1)There will be a account “root” on all images with a password.

 

Note: In the case of Ubuntu just make the first account root.

 

2) There should be a hard password that changes regularly. This can be done by scripting the below command to run every time the image is started (by putting in /etc/rc.local for example):

 

date '+%y%m SEED=someLongString' | md5sum | base64 | sed 's|\(...........\).*|\1|' | passwd --stdin username

Then the command below can be placed in the text description of the image. The user instantiating the image can run the command to get the password. Example:

[rcas6007] ~/> date '+%y%m SEED=someLongString' | md5sum | sed 's|\(...........\).*|\1|' | \
base64 | sed 's|\(...........\).*|\1|'
 

YjNkNTE5YWJ

If the command is compromised the seed will need to be changed.  

 

Alternatively:

There could also be a repository of ssh public keys available via a network connection which images can pull in.

 

3) There will be an account “star” on all images. This account will have the STAR environment (startup scripts). This is the account under which the actual jobs are run.