Infrastructure

The pages in this tree relates to the Infrastructure sub-group of the S&C team.

The areas comprise: General infrastructure (software, web service, security,...), Online computing, operations and user support.

 

Online Computing

General

The online Web server front page is available here. This Drupal section will hold complementary informations.
A list of all operation manuals (beyond detector sub-systems) is available at Operations.
Please use it a startup page.

Detector sub-systems operation procedures - Updated 2008, requested confirmation for 2009

 

Accessing The STAR Protected Network

Creating An Account

To get access to the STAR SSH gateways (which will also allow access to the generic Online Linux Pool) please follow the steps below:

  1. Obtain an RCF Account, and upload your public key to the RCF
  2. Go to The SKM page and login with your RCF account (Your AFS/Kerberos credentials)
  3. Upload your PUBLIC key in openssh format on the main page after logging in.  Your public key should have a name like "id_rsa.pub")
  4. Send an e-mail to STAR Support containing your full name, RCF username, BNL Life Number and a brief description of your intended use of the online resources and/or particular subsytem(s) to be supported
  5. Once you are notified that your account has been created, please follow the steps below to login.
  6. As a user of online resources, it is suggested that you subscribe to the Run Time System mailing list and Mattermost channel for announcements about maintenance periods and configuration changes. 

The online gatekeepers are named stargw.starp.bnl.gov.

ssh -AX username@stargw.starp.bnl.gov

Logging In Via SSH

Linux Users:

  1. You can either script this, or perform these steps manually.
  2. You can now ssh into any of the star protected nodes from here. Just remember to use "ssh -AX" each time in order to forward X11 and the ssh agent. (Please keep in mind that the star gateways are not currently available directly from outside BNL.  You will need to go through the RCF first.)

 

 

EVO Conference Computer

If you would like to be able to use EVO in the 1006 trailer, there is a conference PC setup for use.  There is a generic account on the computer for everyone to share.

The account credentials are:
Username: rhicstar
Password: (See below)
Log On To: Conference (This computer)

I will not post the password anywhere that is not encrypted for security purposes, so please come see me in my office (Building 510 Room 1-179) or send me an e-mail containing your GPG public key.  If you do not have a GPG public key, please bring your laptop, (for desktop users, call me, and I'll come to see you) and I'll help you set it up.  It is quite useful.

Online Linux Pool

This page provides an overview of the Online Linux Pool (OLP).  The OLP is a cluster of computers made available to STAR collaborators with the primary intent of allowing real-time and near real-time run support activities, but with general usage and various computing development and testing projects envisioned as resources permit.

The OLP currently consists of 60 Penguin Altus 1300 rack-mount computers physically located in the DAQ Room, plus two servers that provide home directories (over NFS), user authentication (NIS), and Condor pool management.  The "worker" nodes are named onl01, onl02, ..., onl60.starp.bnl.gov.  These 60 pool nodes have 64-bit Scientific Linux 5.8 (with 32-bit libraries).  Any user with access to the stargw.starp.bnl.gov SSH gateways has access to these 60 nodes.  Users of the RACF will recognise the "rterm" command, which if executed on a stargw host will attempt to connect to one of the nodes with relatively low load. 


Remote filesystems:

All nodes have access to several remote filesystems that may be useful to online computing:

  • /evp/a (read-only access to the DAQ Event Pool)
  • /daq/RTS (read-only access to daqman's /RTS export)
  • /daq/data (read-write(!) access to daqman's /data export)
  • /daq/log (read-only access to daqman's /log export)
  • /onlineweb/www (read-write access to the online web server's space for content to be shared over the web)
  • /afs the standard AFS tree

Additionally, onl01-onl06 are configured to access trigger data at:

  • /trg/trgdata (trgscratch's trgdata export)
  • /trg/scalerdata (startrg2's scalerdata export). 


Condor

A Condor pool is set up on these nodes.  Currently onl01-30 are in the pool (moduo a few specialized nodes not accepting jobs), serving as execute hosts.

rterm is available on the Accessing The STAR Protected Network hosts to select the least-loaded system for login.  Only a subset of nodes are tagged as interactive for rterm.  That list is currently onl01-10 .

Cron

conjobs are accepted and can run only on onl11,12, and 13. To access the exported Web directories in write mode, you need to be part of the onlweb group. Every year before the run, a list of point of contact is compiled and used to determine who should be granted access (this is not given by default).


General system details (hardware, OS, etc):

The Penguin nodes have 64-bit Scientific Linux 5.8 installations (with 32-bit libraries), with these basic hardware specs:

2 x Dual Core AMD Opteron Processor 265, 1800MHz (4 cores per system, no HT)

8GB RAM (PC3200 DDR 400MHz ECC)

4 SATA disk bays

  • onl01-onl30: 4 x 500GB disks (7200RPM) in a RAID configuration providing a 1.3 TB scratch space (mounted at /scratch)
  • onl31-onl60: 4 x 1TB disks (7200RPM) in a RAID configuration providing a 2.6 TB scratch space

Usage suggestions and miscellaneous note for users:

To reduce the burden on the network and the home directory NFS file server, it is advisable for heavy users of distributed jobs (ie. Condor jobs) to avoid unnecessary access to their individual home directories.  As much as possible, please consolidate access to your home directories, and use the local disks as needed for storage.  Small, short-term needs (up to the order of 100MB or so) can use subdirectories under /tmp, while larger demands should use directories under /scratch on each individual node.  We expect at some point in the future to provide a shared file system (other than the home directories) of some significant size, but are not there yet.

The OLP nodes only allow access based on SSH keys.  If you have access to the stargw SSH gateways, you will also automatically have access to the OLP.  To make it most convenient, it is suggested that you familiarize yourself with SSH key agents and SSH key forwarding, which can (nearly) eliminate all need for typing passwords/passphrases.

Online computing run preparation plans

This page will list by year action items, run plans and opened questions. It will server as a repository for documents serving as basis for drawing the requirements. To see documents in this tree, you must belong to the Software and Computing OG (the pages are not public).

Run IX

General

This tree will contain information pertaining to run 9.

Run preparation meetings are held at the usual time i.e. on Friday between 3-5 PM (room reserved , will try to keep to one hour weekly). The following groups are invited to join:

  • The S&C core support as appropriate
  • The online "Run Time System" representatives
    • DAQ - Jeff Landgraf
    • Slow Control - Yury Gorbunov
    • Trigger - Jon (Jack) Engelage
  • All software coordinators as listed on the Organization page

The goal of the meetings are to discuss any issues with the infrastructure, networking, code readiness, resource and associated needs as well as any other computing related issues relevant to the smooth running of online operations. The forum and meeting also serves as a vehicule for passing information on time constraints and requirements to the diverse groups in a structured and cohesive manner.

Related documents

None so far.

Related meetings

 

Run VIII

General

This tree will contain information pertaining to run 8.

Run preparation meetings are held on Friday between 3-4 PM (room reserved up to 5 PM). The following groups are invited to join:

  • The S&C core support as appropriate
  • The online "Run Time System" people
    • DAQ - Jeff Landgraf
    • Slow Control - Will Waggoner
    • Trigger - Jon Engelage
  • All software coordinators as listed on the Organization page

The goal of the meetings are to discuss any issues with the infrastructure, networking, code readiness, resource and needs or any other computing and related issues relevant to the smooth running of online operations. The forum and meeting also serves as a vehicule for passing information on time constraints and requirements to and through the diverse group in a structured and cohesive manner.

In Run VII, the forum was used to discuss the security plan and several key reshape of the online computing structure to achieve minimum cyber-security accreditation.

Related documents

Related meetings

 

 

Experts on call

The experts on call for software related run support are:

Role Name Primary phone Office Phone Other
Oflline QA + FastOffline production Jerome Lauret (631) 786-0479 (631) 344-2450  
Gene Van Buren (631) 312-4324 (631) 344-7953 (631) 775-6620
Online QA, PPlots Paul Sorensen (510) 375-5582 (631) 344-2420  
David Kettler (206) 218-3885 (206) 616-8141  
Hardware support, online tools
Wayne Betts (631) 804-6897 (631) 344-3285  
Database Micheal DePhillips (631) 356-2257 (631) 344-2499 (631) 744-3295


When multiple choices are available, the name in bold indicates the current on-call expert. Please, consult this page prior to calling the expert.

Run VII

Background

Facing a new paradigm of introducing CyberSecurity DOE regulations into our infrastructure, several action items were presented at the 2006 run critique meeting. The presentation is attached below as STAR-Critique-06.pdf (see below). The urgent and immediate items, some of which requiring deep restructuring, were:

  • We MUST establish an internal controlled perimeter to the unroutable network. This network will be accessible via a gatekeeper model. Vulnerable devices should be isolated to the internal network layer
  • All network and communication layers must be documented
  • Physical access to console were describe as part of the Shift procedure and shit alternance. Access to the online computing infrastructure MUST be controlled
  • All systems MUST be re mediated and brought up to the proper level of OS version and safety 
    • shall exceptions be needed, the device should have the proper control and monitoring
    • isolation in the private network of node we cannot upgrade due to operational-need is the other solution
  • OS flavor reduction – We propose to reduce the OS flavors to enhance and optimize support and maintenance
  • Group account access should be regulated via keys (ssh keys) and tight to indivdiuals (no a floating password without a clear understanding of who has it)
  • root access shall be restricted
    • A list of users having root access MUST exists at any point in time. In other words, only a few (documented) users should have root access privileges.
    • We must provide best effort to implement a configuration management strategy i.e. how changes occurs in our infrastructure shall follow a procedure and lead to an updated documentation.
  • Maintenance of computing equipment will be the responsibility of the S&C, DAQ and Slow Control groups as appropriate under general guidance of the S&C group.

 

The run preparation will be established within the following guidelines

  • General
    • Assess hardware replacement and cost (display, printer, UPS, switches, ...)
    • Assess sub-system needs for resources (disk space, bandwidth, database access, ...)
  • Networking 
    • Understand and reshape the current online Network spaghetti to a two layer model, with a gatekeeper model
    • Isolate vulnerable devices on a private network
    • Provide easer a routing or gatekeeper model ; reduce dual or tri-NIC connections
    • Patch all vulnerable machine and bring all equipment to appropriate level
  • Organizational needs – root access and password 
    • Establish a in-principle layer of responsibility and accountability
    • Determine root access and generic account access and usage
    • Provide infrastructure to manage keys as a function of nodes machine
    • Document procedure and equipment, establish principles for configuration management
    • Require for new equipment to comply with baseline control
      • New equipment shall not be brought randomly but integrated as part of the online infrastructure documentation
  • Software
    • Deploy a new Web server
    • Revisit all online common tools and needs – RunLog, ShiftLog, Web interfaces ...
    • Introduce technology and paradigm change for HTML-refresh poor-man's job approach
      • technique has spread and creates heavy load
    • Review Pplots needs and coverage
    • Introduce Scaler monitoring tool
    • Revisit Ganglia monitoring with special care on broadcast/multi-cast
  • Establish a first testbed of database consolidation for high-luminosity regime 
    • With help from Slow Control – IRMIS project

Understanding our online Network

The following table is a first cut to understanding the inter-connections between online hardware.

  • ch2connect.xls shows the NFS mounts between machines
  • Network-top level.pdf is a rough first cut of the network schematic

Patching and OS version-ing

  • July 28th 2006 
    • The matrix Old_Linux.pdf displays the list of nodes requiring attention
    • Two Windows machines (Alexei's Lebedev responsibility) require immediate attention.

 

Related meeting

 

New online web server (dean.star.bnl.gov)

New web server notes for content providers and users


There is a new web server (dean.star.bnl.gov) online to replace ch2linux.star.bnl.gov.  The "online.star.bnl.gov" alias was switched to dean.star.bnl.gov at about 2pm on Tuesday, Feb. 29, 2007.  There is perhaps as much as 24 hours of DNS propagation time for the alias change to make it around the world, during which time, there could be confusion about which system (dean or ch2linux) is actually being accessed.

We plan to keep ch2linux online for 1-2 weeks to help in debugging, and as a fallback for broken content until it is fixed.

A gotcha to watch out for is the hard-coding of the "ch2linux" name in any links.  Use of the "online.star.bnl.gov" alias is generally preferable.

For those of you with individual accounts on ch2linux, the accounts have been duplicated on the new server (if you have an account, you can immediately use the key management system ( https://www.star.bnl.gov/starkeyw ) to install openssh public keys if desired on both the current (ch2linux) and new (dean) web servers).

Some hints and suggestions for content maintainers:


Some of the configuration changes between ch2linux and dean (particularly to php) may require modifcations to existing content to work properly on the new server.  With php, the change that seems most likely to bite us is "register_globals = Off".  On ch2linux, this is set to On, allowing php automatic access to variables passed in POST or GET requests.  Here is a quick primer on the effect of turning this off, taken from the php.ini file:

;     Global variables are no longer registered for input data (POST, GET, cookies,
;     environment and other server variables).  Instead of using $foo,
;     you can use $_REQUEST["foo"] (includes any variable that arrives through the
;     request, namely, POST, GET and cookie variables), or use one of the specific
;     $_GET["foo"], $_POST["foo"], $_COOKIE["foo"] or $_FILES["foo"], depending
;     on where the input originates.  Also, you can look at the
;     import_request_variables() function.
;     Note that register_globals is going to be depracated (i.e., turned off by
;     default) in the next version of PHP, because it often leads to security bugs.
;     Read http://php.net/manual/en/security.registerglobals.php for further
;     information.

A second php issue is that we'd like to keep the default setting of "display_errors = Off" in php, as a security precaution.  However, since having it turned on is often useful for debugging, we can leave it on for a week or two in the initial stages, then turn it back to off.  A common issue with these php settings, is that you might notice mostly harmless "Notice" messages from php - commonly about uninitialized variables -- we all know to always initialize our variables, right?

If your php code (or perl, or whatever) is encountering file access errors, the problem may be stemming from SELinux.  I have fixed several file contexts and the local SE policy to fix problems with the RICH Scaler plots, the RunLog Browser and tomcat.  Unfortunately, content owners may have a difficult time diagnosing such problems.  One way is to login to the server, "cause" the error and then look at the output of "dmesg |tail -n 30" (30, 40, whatever it takes) and look for an audit messeages with "avc:  denied" lines that might be related to your content.  If you see such errors, inform Wayne Betts who can look into it further.  As a quick test, we can temporarily disable SELinux to see if it clears up any problems.



Another common issue has been database access controls.  Many of our databases have fairly granular access controls, and dean may not be configured for access to everything it needs.  If that is the suspected source of any problems, Mike DePhillips can look into it.

STAR's SSH Public Key Management System

SSH Public Key Management Tool

Overview

The main from end Web interface begins from https://www.star.bnl.gov/starkeyw/  (see step by step instructions in the next section). This SSH public key management system has been designed in STAR to address the following requirements:

  • Use of two-factor authentication for remote logins
  • Allow association of remote user as a one-to-many association: a remote user may associate his/her keys to a local domain user account onto one or more local so-called  "group" account which are not tight to one individual (such account is for example an "operator" account or even the "root" account)
  • Provide a simple Web front end to users to request, view and manage their own key associations (hence easily managing access to a domain)
  • Allow a set of system administrators to easily manage key association for a domain (globally disabling users having left STAR for example)
  • Using SSH key fingerprint, allow to identify which user is logging in to which accounts (a security requirement)
  • Be able to provide upon demand a list of who had access to which account on what machine and when in one click (historical records, easily access to access grant lists)

Such system was developed for STAR and named the "SSH Key Management system" aka SKM. More information can be found in this publication. A side benefit for users also can be seen in the reduction in the number of passwords to remember and type.

Notes

  • In purpose, this system is similar to the RCF's key management system (full instructions here), but is more powerful because of its flexibility in the association of hosts (client systems), user accounts on those clients, and self-service key installation requests.
  • The STAR SKM system has been initially used for managing the online computer access and has expanded since to manage all nodes in STAR running a specialized service (offline database, web server and so on), streamlining the security model by making it consistent across nodes.
  • The system was designed to be as secured as possible (central repository of keys, pull information only from clients and NO push to avoid multiple-point-of-corruption). In other words, each clients have a light weight daemon polling and pulling the SSH key association information our of a central DB for itself and handling installing keys. Clients are not allowed to manage keys (the Web interface only does). The client daemon creates no load.

 

Where do we start? What is a typical use example?

You should use your RCF username and Kerberos password (credentials) to enter this interface.

Here is a typical scenario of the system usage: 

  1. A sysadmin of a machine named FOO creates a user account named "JDOE" and, if not done already, installs the key_services client.
  2. A user account 'JDOE' on host 'FOO' is configured in the Key Management system by a key management administrator*.
  3. John Doe uploads (via the web) his or her public ssh key (in openssh format).
  4. John Doe requests (via the web) that his key be added to JDOE's authorized_keys file on FOO.
  5. A key management administrator approves the request, and the key_services client places the key in ~JDOE/.ssh/authorized_keys.

* Current admins are Wayne Betts and Jerome Lauret.

At this point, John Doe has key-based access to JDOE@FOO.  Simple enough?  But wait, there's more!  Now John Doe realizes that he also needs access to the group account named "operator" on host BAR.  Since his key is already in the key management system he has only to request that his key be added to operator@BAR, and voila (subject to administrator approval), he can now login with his key to both JDOE@FOO and operator@BAR.  And if Mr. Doe should leave STAR, then an administrator simply removes (disables) him from the system and his keys are removed from both hosts.

 

More details

Slightly Deeper...

There are three things to keep track of -- people (and their SSH keys of course), host (client) systems, and user accounts on those hosts:

People want access to specific user accounts at specific hosts.

The system maintains a list of user accounts for each host system, and a list of people associated with each user account at each host . To be clear: while the Web interface allows any user to log in, the system does not have any automatic user account detection mechanism at this time, each  "{user-}account" has to be added by hand by an administrator for that account to be listed as a possible association for node FOO or BAR.

Behind the scenes, each host that is participating in the system has a keyservices client installed that runs as a system service.  The keyservices_client periodically (at five minute intervals by default) polls a central service for its information.  In other words, the back-end database is consulted for the list of approved associations and the appropriate keys are downloaded by the client and added to the related account's authorized_keys files accordingly.

In our case, orion.star.bnl.gov hosts all the server services (starkeyw and starkeyd via Apache, and a MySQL database), but they could all be on separate servers if desired.

Deployment Status and Future Plans

Only RHEL and Scientific Linux with i386 and x86_64 kernels have been tested, but there is no reason to believe that the client couldn't be built on other Linux distributions or Solaris. Please contact one of the admins (Wayne Betts, Jerome Lauret) if you'd like to volunteer and add your sub-system node to SRKM or if you have any questions.

User access to the Web interface is currently based on the RCF Kerberos authentication. You will hence need a valid BNL/RCF account to access the Web interface and manage key associations for your account.

In 2012, SKM was extended to implement volatile key association (lifetime and expiration may be set to each key associations). This feature allows granting access to a given user to a privileged account on a temporary debugging-need basis (as one example). This feature has also been seen as in use for group account of operational nature having rotating and changing teams at each new runs (in such case, the new list of who is associated to such account need to be re-assessed yearly and the associations would be set for example to expire after a year's period). This is a feature - the default has no expiration.

Run 19

Feedback from software coordinators

Active feedback

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC

Raghav Kunnawalkam Elayavalli

Nick Lukow

- same -

Note: L2algo, bemc and  bsmdstatus

EPD Prashant Shanmuganathan N/A - same -
BTOF Frank Geurts - same - Frank Geurts
Zaochen Ye
ETOF Florian Seck - same - Florian Seck
Philipp Weidenkaff
HLT Hongwei Ke - same - - same -

Other software coordinators

sub-system Coordinator
iTPC (TPC?) Irakli Chakaberia
Trigger Akio Ogawa
DAQ Jeff Landgraf
...  

Run 20

Status of calibration timeline initialization

In RUN: EEMC, EMC, EPD, ETOF, GMT, TPC, MTD, TOF
Test: FST, FCS, STGC (no tables)
Desired init dates where announced to all software coordinators:

- Geometry tag has a timestamp of 20191120
- Simulation timeline [20191115,20191120[
- DB initialization for real data [20191125,...]

     Please initialize your table content appropriate yi.e.
sim flavor initial values are entered at 20191115 up to 20191119
(please exclude the edge),  ofl initial values at 20191125
(run starting on the 1st of December, even tomorrow's cosmic
and commissioning would pick the proper values).

 

 

Status - 2019/12/10

EMC  = ready
ETOF = ready - initialized at 2019-11-25, no sim (confirming)
TPC  = NOT ready [look at year 19 for comparison]
MTD  = ready
TOF  = Partially ready? INL correction, T0, TDC, status and alignement tables initialized
EPD  = gain initialized at 2019-12-15 (!?), status not initialized, no sim

EEMC = ready? (*last init at 2017-12-20)
GMT  = ready (*no db tables)



Status - 2019/12/09

EMC  = ready
ETOF = ready? initialized at 2019-11-25, no sim
TPC  = NOT ready
MTD  = ready
TOF  = NOT ready
EPD  = gain initialized at 2019-12-15 (!?), status not initialized, no sim

EEMC = ready? (*last init at 2017-12-20)
GMT  = ready (*no db tables)

 

 

Software coordinator feedback for Run 20 - Point of Contacts

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC
EEMC

Raghav Kunnawalkam Elayavalli

Nick Lukow

- same -

Note: L2algo, bemc and  bsmdstatus

EPD [ TBC] - same - - same -
BTOF Frank Geurts - same - Frank Geurts
Zaochen Ye
ETOF Florian Seck - same - Florian Seck
Philipp Weidenkaff
HLT Hongwei Ke - same - - same -
TPC Irakli Chakaberia - same -
Flemming Videbaek
Trigger detectors Akio Ogawa - same - - same -
DAQ Jeff Landgraf N/A  


---




Run 21

Status of calibration timeline initialization

- Geometry tag has a timestamp of 20201215
- Simulation timeline [20201210, 20201215]
- DB initialization for real data [20201220,...]

Status - 2020/12/10

 

Software coordinator feedback for Run 21 - Point of Contacts

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC
EEMC

Raghav Kunnawalkam Elayavalli

Nick Lukow

- same -

Note: L2algo, bemc and  bsmdstatus

EPD Prashanth Shanmuganathan (TBC) Skipper Kagamaster - same -
BTOF Zaochen - same - Frank Geurts
Zaochen Ye
ETOF Philipp Weidenkaff - same - Philipp Weidenkaff
HLT Hongwei Ke - same - - same -
TPC Yuri Fisyak - same - Flemming Videbaek
Trigger detectors Akio Ogawa - same - - same -
DAQ Jeff Landgraf N/A  
Forward Upgrade Daniel Brandenburg - same - FCS - Akio Ogawa
sTGC - Daniel Brandenburg
FST - Shenghui Zhang/Zhenyu Ye
       

---

Run X

Below are the related meetings:

Run XIII

Preparation meeting minutes

Database initialization check list

TPC Software  – Richard Witt          NO
GMT Software  – Richard Witt          NO
EMC2 Software - Alice Ohlson          Yes
FGT Software  - Anselm Vossen         Yes
FMS Software  - Thomas Burton         Yes
TOF Software  - Frank Geurts          Yes
Trigger Detectors  - Akio Ogawa       ??
HFT Software  - Spyridon Margetis     NO (no DB interface, hard-coded values in preview codes)

 

Calibration Point of Contacts per sub-system

If a name is missing, the POC role falls onto the coordinator.
                Coordinator           Possible POC
                ------------          ---------------
TPC Software  – Richard Witt          
GMT Software  – Richard Witt          
EMC2 Software - Alice Ohlson          Alice Ohlson  
FGT Software  - Anselm Vossen         
FMS Software  - Thomas Burton         Thomas Burton    
TOF Software  - Frank Geurts          
Trigger Detectors  - Akio Ogawa       
HFT Software  - Spyridon Margetis     Hao Qiu

Online Monitoring POC

The final list from the SPin PWGC can be found at 2013 Run Tasks . The table below includes the Spin PWGC feedback and other feedbacks merged.

  Directories we inferred are being used (as reported in the RTS Hypernews)
  scaler Len Eun and Ernst Sichtermann (LBL) This directory usage was indirectly reported
  SlowControl James F Ross (Creighton)  
  HLT Qi-Ye Shou The 2012 directory had a recent timestamp but owned by mnaglis. Aihong Tang contacted 2013/02/12
Answer from  Qi-Ye Shou 2013/02/12 - will be POC.

  fmsStatus Yuxi Pan (UCLA) This was not requested but the 2011 directory is being overwritten by user=yuxip
FMS software coordinator contacted for confirmation 2013/02/12
Yuxi Pan confirmed 2013/02/13 as POC for this directory

     
Spin PWG monitoring related directories follows
  L0trg Pibero Djawotho (TAMU)  
  L2algo Maxence Vandenbroucke (Temple)  
  cdev Kevin Adkins (UKY)  
  zdc Len Eun and Ernst Sichtermann (LBL)  
  bsmdStatus Keith Landry (UCLA)  
  emcStatus Keith Landry (UCLA)  
  fgtStatus Xuan Li (Temple) This directory is also being written by user=akio causing protection access and possible clash problems.
POC contacted on 2013/02/08, both Akio and POC contacted again 2013/02/12 -> confirmed as OK.

  bbc Prashanth (KSU)  



Run XIV


Preparation meeting meetings, links


Notes

  • 2013/11/15
    • Info gathering begins (directories/areas and Point of Contacts)
      Status:
      2013/11/22, directory structure, 2 people provided feedback, Renee coordinated the rest
      2013/11/25, calibration POC, 3 coordinators provided feedback - Closed 2013/12/04
      2013/12/04, geometry for Run 14,
       
    • Basic check: CERT for online is old if coming from the Wireless
      Status: fixed at ITD level, 2013/11/18 - the reverse proxy did not have the proper CERT
  • 2013/1125

Database initialization check list

This actions suggested by this section has not started yet.

Sub-system Coordinator Check done
DAQ
Jeff Landgraf  
TPC Richard Witt  
GMT Richard Witt  
EMC2 Mike Skoby
Kevin Adkins
 
FMS Thomas Burton  
TOF Daniel Brandenburg  
MTD Rongrong Ma  
HFT Spiros Margetis (not known)
Trigger Akio Ogawa  
FGT Xuan Li  


Calibration Point of Contacts per sub-system

"-" indicates no feedback was provided. But if a name is missing, the POC role falls onto the coordinator.

Sub-system Coordinator Calibration POC
DAQ Jeff Landgraf -
TPC Richard Witt -
GMT Richard Witt -
EMC2 Mike Skoby
Kevn Adkins
-
FMS Thomas Burton -
TOF Daniel Brandenburg -
MTD Rongrong Ma Bingchu Huan
HFT Spiros Margetis Jonathan Bouchet
Trigger Akio Ogawa -
FGT Xuan Li N/A


Online Monitoring POC


scaler   Not needed 2013/11/25
SlowControl Chanaka DeSilva OKed on second Run preparation meeting
HLT Zhengquia Zhang  Learn incidently on 2014/01/28
HFT Shusu Shi Learn about it on 2014/02/26
fmsStatus   Not needed 2013/11/25
L0trg Zilong Chang
Mike Skoby
 
Informed 2013/11/10 and created 2013/11/15
L2algo  Nihar Sahoo Informed 2013/11/25
cdev   Not needed 2013/11/25
zdc   may not be used (TBC)
bsmdStatus  Janusz Oleniacz Info will be passed from Keith Landry 2014/01/20
Possible backup, Leszek Kosarzewski 2014/03/26
emcStatus  Janusz Oleniacz Info will be passed from Keith Landry 2014/01/20
Possible backup, Leszek Kosarzewski 2014/03/26
fgtStatus   Not needed 2013/11/25
bbc
 Akio Ogawa Informed 2013/11/15, created same day


Run XV

Run 15 was preapred essentiallydiscussing with indviduals and a comprehensive page not maintained.

Run XVI


This page will contain feedback related to the preparation of the online setup.

 

Notes



 

Online Monitoring POC

scaler    
SlowControl    
HLT Zhengqiao Feedback 2015/11/24
HFT Guannan Xie Spiros: Feedback 2015/11/24
fmsStatus   Akio: Possibly not needed (TBC). 2016/01/13 noted this was not used in Run 15 and wil probably never be used again.
fmsTrg   Confirmed neded 2016/01/13
fps   Akio: Not neded in Run 16? Perhaps later.
L0trg Zilong Chang Zilong: Feedback 2015/11/24
L2algo Kolja Kauder Kolja: will be POC - 2015/11/24
cdev Chanaka DeSilva  
zdc    
bsmdStatus Kolja Kauder Kolja: will be POC - 2015/11/24
bemcTrgDb Kolja Kauder Kolja: will be POC - 2015/11/24
emcStatus Kolja Kauder Kolja: will be POC - 2015/11/24
fgtStatus   Not needed since Run 14 ... May drop from the list
bbc
Akio Ogawa Feedback 2015/11/24, needed
rp    

 

Calibration Point of Contacts per sub-system

Sub-system Coordinator Calibration POC
DAQ Jeff Landgraf -
TPC Richard Witt
Yuri Fisyak
-
GMT Richard Witt -
EMC2 Kolja Kauder
Ting Lin
-
FMS Oleg Eysser -
TOF Daniel Brandenburg -
MTD Rongrong Ma (same confirmed 2015/11/24)
HFT Spiros Margetis Xin Dong
HLT Hongwei Ke (same confirmed 2015/11/24)
Trigger Akio Ogawa -
RP Kin Yip -

 

Database initialization check list



 

Online network documentation

This is to serve as a repository of information about networking in the online environment. 

 

Background as of fall 2009

The network layout at the STAR experiment has grown from a base laid over ten years ago, with a number of people working on it and adding devices over time with little coordination or standardization.  As a result, we have, to put it bluntly, a huge mess of a network, with a mix of hardware vendors and media, cables going all over the place, many of which are unlabelled and now buried to the point of untraceability.  We have SOHO switches all over the place, of various brands, ages and capabilities.  (It was only about one year ago all hubs were at least replaced with switches, or so I think – I haven’t found any hubs since then.)  There are a handful of “managed” switches, but they are generally lower-end switches and we have not taken advantage of even their limited monitoring capabilities.  (In the case of the LinkSys switches purchased one year ago, I found their management web interface poor – slow, buggy and not very helpful.)

In addition to the general messiness, a big (and growing) concern has been that during each of the past several years, there have been a handful of periods of instability in the starp network, typically lasting from a few minutes to hours (or even possibly indefinitely in the most recent cases which were resolved hastily with switch hardware replacements in the middle of RHIC runs).   The cause(s) of these instabilities has never been understood.  The instabilities have typically manifested as slow communications or complete lack of communication with devices on the South Platform (historically, most often VME processors).  Speculation has tended to focus on ITD security scanning.  While this has been shown to be potentially disruptive to some individual devices and services, broad effects on whole segments of the network have never been conclusively demonstrated, nor has there been a testable, plausible explanation for the mechanism of such instability. 

The past year included the two most significant episodes of instability yet on starp, in which LinkSys SLM 2048 switches (after weeks or months of stability) developed problems that appeared to be similar to prior issues, only more severe.  The two had been purchased as a replacement (plus spare) for a Catalyst 1900 on the South Platform.  When the first started showing signs of trouble, it was replaced by the second, which failed spectacularly later in the run, becoming completely unresponsive through its web interface and pings, and was only occasionally transmitting any packets at all, it seemed.   (After all devices were removed, and the switch rebooted, it returned to normal on the lab bench, but has not been put back into service.)

At this point, all devices were removed from the LinkSys switch and sent through a pair of unmanaged SOHO switches, which themselves each link to an old 3Com switch on the first floor.  Since then, no more instabilities have been noted, but it has left a physical cabling mess and a network layout that is quite awkward.  (And further adding to the trouble, at least one of the SOHO switches has a history of sensitivity to power fluctuations, every once in a while needing to be power-cycled after power dips or outages. 

In addition, there have been superficially similar episodes of problems on the DAQ/TRG network, which shares no networking hardware with the starp network.  As far as I know, these episodes spontaneously resolved themselves.  (Is this true?)  Speculation has been on “odd” networked devices (such as oscilloscopes) generating unusual traffic, but here too there is no conclusive evidence of the cause.  Having no explanation, it seems likely this behavior will be encountered again.
 

Core components

There are several “core” pieces currently.  Core is defined somewhat vaguely as connecting lots of devices or requiring relatively high performance: 

1.    ITD’s main switch in the DAQ room
2.    DAQ’s event builder switch in the DAQ room
3.    the starp switch on the South Platform
4.    the DAQ/TRG switch on the South Platform
5.    the Force 10 switches for the HPSS network in the DAQ room

It seems likely that any reshape will have to include those same core components, though perhaps some combinations are possible at the hardware level using VLANs or other technologies.  (combining starp and DAQ/TRG on the platform on to a single large switch, for instance)
 

ITD's Catalyst chassis in the DAQ room (subnets 60, 162, wireless and possibly others in 1006)

This switch chassis is in the networking rack in the northwest corner of the DAQ room.  It is managed by ITD.  STAR has no way to interact with this switch at the software/configuration level.  

 

Slot 1:  WS-X4013 (Supervisor II Engine, fiber uplink to 515 and local management port)

Slot 2:  WS-X4548-GB-RJ45 (48 1Gb/s copper ports @8:1 oversubscription)  port 43 is 162 subnet, rest are subnet 60.

Slot 3: WS-X4232-RJ-XX (32 copper 100 Mb/s) plus a WS-U5404-FX-MT daughter card with 4 MTRJ fiber ports at 100Mb/s)

Slot 4: WS-4148-RJ (48 copper 100Mb/s) - mix of subnets 60 and 162?

Slot 5: WS-4148-RJ (48 copper 100Mb/s)  - all subnet 60?

Slot 6: WS-X4306-GB (6 GBIC (not mini!) ports, 3 of which have 1000-SX modules with SC connectors)

 

Images and miscellaneous files

Here we can keep miscellaneous files documenting the state of the network.

First, I have attached an image showing the current (late 2009/early 2010) switch layout and links in the WAH. ("WAH_switches.pdf")

Then there is an "after" picture with a rough idea of the patch panel placement to replace most of the unmanaged switches. ("WAH_patch_panels.pdf")

For the South Platform, a more refined patch panel plan was put together in June 2010 ("Network Plan for South Platforms.doc")

There is an attachment with general guidelines for installing UTP ("Cat5e_Network_cable.ppt")

 

 

Locations needing network access

WAH: (starp and DAQ/TRG devices are scattered throughout these locations.  I am going to use the term “satellite racks” to include all locations within the C-AD PASS system that are NOT on the South Platform.  Also, note that the satellite racks are semi-mobile, and the entire detector platform (North and South) can move into the Assembly Building.):

-    PMD racks: ~3 devices on starp and ~3 on DAQ/TRG

-    FMS/FPD east side:  Handful of devices on DAQ/TRG and on star

-    Southwest corner work area: rarely more than two systems here, but might want starp, “trailers” and DAQ/TRG networks here for use as needed

-    EEMC racks, west side:  Handful of devices on DAQ/TRG and on starp

-    FPD/FMS west racks:  Handful of devices on DAQ/TRG and on starp

-    PP2PP east and west:  at least one VME processor on DAQ/TRG on each side - these are in the RHIC tunnel, technically not in the WAH.

-    South platform – (IMPORTANT NOTE:  The south platform must remain electrically isolated from the rest of the facility – there can be no conducting cables running from the South Platform to other locations)
o    First floor:  Three rows of 8-9 racks each (volatile, in that subsystems and components are installed or removed each year)
o    Second floor:  Three rows of 8-9 racks each (volatile)

-    North platform:  currently unoccupied, but has had devices in the past and a switch on the starp network is still present there, with a fiber link back to the South Platform (somewhere!)

Control Room:
-    Perimeter (~3 dozen PCs), almost all on starp, but
o    2-3 on DAQ/TRG
o    4-5 on C-AD 108
o    1-2 on C-AD 90 network?.
o    Numerous small unmanaged switches in this room currently

DAQ Room:  (Highest performance of the entire facility is needed in rack row DA, including a minimum 56-port switch with non-blocking/line rate 1Gb inter-links on the DAQ/TRG network)

-    three “rows” plus two networking racks:
o    the “old” network rack and the “new network rack” near the northwest corner
o    rack row “DA” on west side (nearest the Control Room)
o    shelf row in middle with a racks at each end.

  • Northern-most rack is ~20 nodes on “starp” – current rack has at least three unmanaged 8-port switches.
  • Remainder of row is primarily DAQ/TRG with 3-4 starp nodes - both netwokrs go through two unmanaged switches in the rack immediately to the south of the shelves.

o    East row:  ~6 stand-alone starp servers (one of which has a DAQ/TRG connection as well), along with a handful of VME devices on starp.  DAQ or trigger might have a device or two here.  The rack space is primarily occupied by devices on a C-AD network.

GMR:
- 3 PCs – generally stable area.

Clean room:
-    several jacks needed, network use may vary between starp, daq/trg and the 130.199.162 subnet depending on the active use at any time

1006C and 1006D (trailers):
    - typically only subnet 130.199.162 is needed here.


 

 

Meeting notes for week of Oct. 19, 2009

Online network reshape notes from the week of Oct. 18, 2009

During this week, three meetings were held to discuss the STAR online networking reshape plans.

The first meeting included Jeff Landgraf, Wayne Betts, Dan Orsatti (ITD) and Frank Burstein (ITD).  At this meeting the ITD network engineers presented two proposals for core network components based on information previously provided to them by STAR.  The two options were Force-10 based and Cisco-based, with costs of approximately $150,000 and $100,000 respectively.  They included a shared infrastructure for the DAQ/TRG and STARP networks, including a switch redundancy in the DAQ room to handle the two networks and meet DAQ’s relatively high performance needs in the DAQ room.  These ITD options are generally smart, expandable, highly configurable and well-supported by ITD, and meet the initial requirements.

However, in informal discussions since then, Bill Christie suggested that we should consider the possibility of radiation damage and/or errors in any electronic equipment in the WAH.  While this had been mentioned as a possibility in the past, it was not generally taken seriously by those of us in STAR looking after the networks.  Nor is there any way for us to test this to a standard of “beyond reasonable doubt” (or any other standard really).  At Bill’s suggestion, we (Jeff L., Wayne B., Jack E., Yuri G. and Bill C.) met with three members of  C-AD’s networking group, who stated they were certain that radiation could impair switches and strongly suggested that ITD’s suggested equipment was inappropriate for a radiation area.  They also provided some feedback from individuals at two other laboratories that networking equipment in radiation areas are subject to upsets, with one explanation for effects on metal-oxide semiconductors, which at face value would suggest that newer (thus generally smaller) electronic components would be less susceptible, however my intuition is that smaller electronics are denser, and more easily upset by smaller deposited charge, and thus might be more susceptible. 

Here are excerpts from the other labs:

From JLab:  "The flash memory loses its ability to hold data, making it
useless. We have worked around the problem by pulling cable or fiber
back to lower radiation areas wherever we can. Because we made these
cabling changes when we were only using cisco fixed-configuration
100Mbit switches ( 29XX models), I have no data for Gigabit switches.
Since our experience is that it's the flash memory that fails, I'd
expect no better performance from any other switches. All of our
switches that use modular supervisor modules are outside of radiation
areas."

From FermiLab:  "The typical devices used employ metal oxide
semiconductors and the lock up happens when ionizing radiation is
trapped in the gate region of the devices. We see this happen at our two
detectors (CDF and DZero) when losses go up and power supplies circuits
latch up. The other thing working in the positive direction is that when
IC feature sizes go down, there is less likelihood for the charge to get
trapped so they are more radiation tolerant. Having said all that I
can't answer your specific question because we don't put switches or
routers in the tunnel at all."

All this said, the general consensus was that we should move as much “intelligence” as far away from the beam line as reasonably possible.  (Until now, the “big” switches on the platform have actually been about as close to the beam line as possible!)  This means putting any switches in rack rows 1C.  Given both the cost and the radiation concern, we (the STAR personnel) agreed to investigate less expensive switches than ITD’s suggestion, while trying to provide some level of intelligence for monitoring.  We also have a consensus that the DAQ/TRG and STARP networks should try to use common hardware whenever possible, and that we should work to remove as many SOHO-type unmanaged switches as possible as time permits (replacing them with well-documented and labelled patch panels feeding back to core switches).  The C-AD personnel also recommended Cisco’s 2950, 2960 and 3750 switches and Garrett products in general.  One more miscellaneous tidbit from Jack we should avoid LanCast media convertors.

The final meeting of the week included Jerome, Wayne and Matt Ahrenstein, in which Jerome was briefed on the two prior meetings and he generally agreed with the direction we are taking.  At this meeting, we selected an additional area to try to clean-up before the run, specifically the racks on the west side, where there are at least four 8-port unmanaged switches (3 on DAQ/TRG and one on STARP).  He also suggested we consult with Shigeki from the RACF about the whole affair, and is trying to arrange such a meeting as soon as possible.

In addition to this, Jeff has also stated that while either ITD solution would meet DAQ’s needs for several years, he believes he can obtain adequate performance for far less money with lower end equipment.  Here is Jeff's latest on the DAQ needs for the network:

 

"My target is 20Gb/sec network capability across switches.   In likely 
scenarios, the network capability would be significantly higher than 
this because hi bandwidth nodes would all be on the same switch 
(ironically, the cheaper switches mostly seem to be line-speed switches 
internally, unlike the big cisco switches...)    However, in the current 
year, I'll have a hard limit of 12 gigabit ethernet cards incoming on 
EVBs for a hard max of 12Gb/sec.    The projected desired data, 
according to the trigger board is around 6Gb/sec (600MB/sec).   I don't 
expect much more than a factor of two through the EVBs above this 
600MB/sec in the lifetime of STAR (meaning current TPC + HFT + FGT), 
although there are big uncertainties particularly for the HFT.     The 
one lump in the planning involves potential L3 farms - and I don't know 
how this will play out.   There are many scenarios some of which would 
not impact the network (ie... specialized hardware plugged into the TPX 
machines...),  but my current approach is that the network needs will 
have to be incorporated in the L3 farm design plan..." 


 

Where does this leave us?  We need to quickly evaluate options for the “big” switches for the DAQ room and the South Platform.  The DAQ and Trigger groups have 3(?) similar managed switches that might be adequate for the South platform (including a spare), and we should look into the Cisco models suggested by C-AD.  We also should let ITD make another round of suggestions based on our discussions to date, and especially focus with them on what to do with the large ITD switch in the DAQ room that currently has the link to the rest of the campus “public” network.  And we need to do this rather hastily.

 

 

 

Open Questions

Do we support multiple networks on single switches with VLANs, switch port segmentation or other means?  For instance, at remote spots, like PMD’s racks, can we put in a single switch and have it handle both starp and DAQ/TRG?  Daniel Orsatti's most recent advice was leaning towards having a few large switches in four or five core places with VLANs and installing patch panels at or near the various locations needing network connections.
 
Is there a single brand/line of switch equipment that meets most or all of our goals?  Can we get a line of switch products that includes a range from small (~8 port) switches up to the large switches required for DAQ’s event builders or ITD’s main switch, such that they can interoperate and be part of shared monitoring?  (If we go with a patch-panels-to-big-switches approach, then the small switches would not be necessary.)

What kind of monitoring can we expect and how much effort will it take for it to be useful?  SNMP-based?  Nagios?  Etc…

Can we setup a shared but “private” monitoring network for the managed switches, such that starp and DAQ/TRG monitoring share the same infrastructure?  (Most likely, yes.)

Can fiber connectors be easily changed/replaced/repaired?  STAR apparently does not have the tools to terminate fibers at this point.  Do we want to acquire the tools and know-how to do this, or continue to rely on ITD and/or folks like Frank Naase (C-AD) who have done most of our fiber termination to date? 

Overview of the reshape started in 2009

The goal of the online networking reshape is to provide a stable and well-understood networking environment with the possibility of future expansion to meet STAR’s foreseeable needs over time.  The physical layout needs to be well understood, with elements of redundancy and/or easily swapped parts on hand as much as possible.  The devices on the network should be known, including their location, what other systems they are expected to interact with and traffic volumes.  Significant networking errors should be detected at the switch level and allow for troubleshooting without significant disruption to large parts of the network. 

 

Along the way, it will be very useful to increase the availability of knowledge and sources of assistance related to the network.  Naturally this calls for a well documented network in any case.  Consolidating networking hardware into a common brand or line for the multiple online networks (which are currently a hodgepodge) may reduce the number of errors encountered, improve the ability of STAR's personnel to understand more fascets of the networking environment and allow for better monitoring of the network performance.  Our network should mesh well with existing ITD infrastructure so that their expertise can be brought to bear as needed.  However, ITD expertise cannot be the sole source of support for the online networks – at least two individuals in STAR (but not much more than that) should have broad access to realtime network data and configuration.  STAR’s 24-hour on-call experts (DAQ and online computing in particular) need to be able to respond quickly to incidents and gather clues and information from all sources.

Plan of action / critical path items

I think we need to start from the core and work outwards.  This will allow us to finish as much as possible before the run starts and start to see the most benefits as early as possible.  The two big pieces at the core (in order of importance) are:

1. DAQ’s event builder switch, which calls for 56 (let’s say 64) non-blocking/line speed 1Gb/s ports.  No matter what, this piece needs to be put in place before the run starts.  We can probably limp by with everything else as it exists now if we have to, but this has to be a new piece of hardware in place before December 1 (is this a reasonable deadline?).

2. Whatever ITD wants to replace the current Catalyst 4000-series chassis and blades in the DAQ room.

After this, the next items for consideration/replacement are the starp and DAQ/TRG switches on the South Platform.

Then it is on to the satellite racks in the WAH with their relatively small number of devices.

Then the DAQ room, cleaning up the handful of unmanaged switches that exist for both starp and DAQ/TRG.

Control Room clean-up.  The available wall jacks in the Control Room are insufficient for the number of devices, and many of the jacks are inaccessible behind the west side console, but at least this area is always accessible and has had few problems, so it isn’t a high priority.

 

Remote power cycling network switches in the WAH

This documents the Network Power Switch plugs used to remotely power cycle STAR's network switches in the Wide Angle Hall.

Updated February 8, 2019  (Ideally, STAR's RackTables would be the definitive source for this information, but it is far from complete.) 

*ID Location Switch IP name NPS IP name NPS plug NPS access method NPS type
             
SW22 east racks east-trg-sw.trg.bnl.local pxl-nps.starp.bnl.gov  8  telnet, http (ssh and https available, but not enabled)  APC AP7901 (August 2015)
SW56 east racks east-s60.starp.bnl.gov eastracks-nps.trg.bnl.local  8  ssh (slow to respond to initial connection)  APC AP7901 (August 2012)
SW59 SP 1C4 splat-s60.starp.bnl.gov netpower1.starp.bnl.gov  3  telnet, http  APC
SW2 SP 1C4 splat-trg2.trg.bnl.local netpower1.starp.bnl.gov  1  telnet, http  APC
SW27 SP 1C4 switch1.trg.bnl.local netpower1.starp.bnl.gov  2  telnet, http  APC
SW60 SP 1C4 splat-s60-2.starp.bnl.gov netpower2.starp.bnl.gov  A1  ssh (has key for wbetts)  WTI NPS-8
SW28 SP 1C4 switchplat.scaler.bnl.local netpower2.starp.bnl.gov  A2  ssh ssh (has key for wbetts)  WTI NPS-8
SW55 west racks west-s60.starp.bnl.gov westracks-nps.trg.bnl.local  1  ssh, http  APC
SW30 west racks switch2.trg.bnl.local eemc-pwrs1.starp.bnl.gov  A4  telnet  old WTI
SW51 NP 1st floor nplat-s60.starp.bnl.gov north-nps1.starp.bnl.gov  1  telnet, ssh, http  APC AP7900B (January 2019)

Reshape design goals

A.  Only use managed switches and have each networked device plug directly into a managed switch port.
   
-    Eliminate all “dumb” consumer/SOHO/desktop switches – they are not robust,  add to confusion when troubleshooting and prevent isolation of individual devices
-    allow the blocking of any single device at any time through its nearest  switch’s management interface
-    block the addition of any new, unknown nodes and/or be informed of anything showing up unexpectedly
-    ability to monitor individual ports for traffic volumes, link settings, errors, major links going down, preferably with some history/logging.
-    allow real-time monitoring and alerts for unusual event (capabilities will be hardware/vendor dependent and subject to available time to develop monitoring tools and become familiar with capabilities)


B.  All devices should be within 10-15 feet of a “core” patch panel or network switch.
-    Individuals working on detector subsystems should not have to install network cables that cross rack rows, go from one floor (or room) to another, etc.
-    Piecemeal additions of network segments by subsystems should not be done – that is to say, no one should be adding switches to the network other than core personnel using “approved” devices consistent with the rest of the network components.
-    This calls for cabled and labeled patch panels and/or switches liberally placed throughout the WAH, the Control Room and the DAQ Room.   



C.    Some degree of “commonality” between the infrastructures of the starp and DAQ/TRG networks.  Same line of hardware, media convertors (when needed), switches, monitoring tools, possibly even shared switches with VLANs.  This is a big question – are VLAN’s viable to share switch hardware amongst starp and DAQ/TRG?  A shared “private” management network for the switches is likely a good idea. 

D.    An easily extensible network, such that new locations can be added easily, and existing locations can have additional capacity added and subtracted in accord with the other goals.

E.    Redundant links (fibers or copper, as appropriate) available between all linked core components (preferably with automatic failover).

F.    Spares on hand for just about everything – a good reason to use as few models of hardware as possible.  If we develop a plan with 10 small 8-port switches in various locations, ideally all 10 will be identical and we will have one or two spares on the shelf at all times.

G.    All network components should be on UPS power so that short and/or localized power outages do not bring down portions of the network.  This is not terribly important, but should be kept in mind and allowed for when feasible.

H.  (Added after the initial items above)  Move IC-based devices (switches) away from beam line and attempt to reduce radiation load.  Our working hypothesis, based on anecdotal evidence, is that at least some of the networking problems last year were caused by errors caused by radiation.  The two "big" switches on the South Platform have historically always been in just about the WORST place for radiation load, so these need to be moved away from the beam line.

 

Rules to Live By in Online Networking at STAR

Document everything!

All hardware with an IP address should be labelled.

All installed cables should have a label on each end that is adequate to quickly locate the other end.

All patch panel ports with cables connected should be labelled appropriately to identify the other end.

All network equipment (switches, patch panels, cable runs, etc.) need to be documented, preferably in appropriate documents in Drupal.

 

Copper connections:

Use Cat5e or higher graded cables.

Use yellow cables for devices connected to the STARP network (130.199.60-61.x IP addresses).

Use green cables for devices connected to the DAQ/TRG network (172.16.x.x IP addresses).

Use colors other than yellow and green for any other network connections.

Use T568A termination when adding connectors to bare cable.

 

Fiber Connections:

Use 50 micron multi-mode fiber.

Use 1000Base-SX fiber transcievers where possible.

 

STAR networks in 1006 and their nicknames

“starp”:  130.199.60.0/23

“DAQ/TRG”: 172.16.0.0/16  (non-routed)

“HPSS”:  RCF network for DAQ → HPSS transfers

“Alexei”: Alexei’s video camera and laser network (currently consists of a switch on the South Platform and a switch in the DAQ room connected by a fiber pair?).  This includes 3-4 PCs including obsolete Windows OSes (e.g. Win 98).  No devices on this network are dual-homed, so it is very isolated from everything else and is mentioned here for completeness.

“trailers”:  130.199.162.  - includes wired connections for printers, vistors’ laptops and workstations not directly involved in operations and may exist outside of the trailers, such as the Control Room for visitors’ laptops while on shift.

“Wireless”: Not really relevant conceptually, but there are also three ITD wireless access points in the area.

“C-AD 108” and “C-AD 90”:  C-AD has at least two networks operating in the DAQ and Control Rooms, which are left well enough alone in their hands, but are mentioned here for the sake of completeness.
 

Shift Accounting

This page will now hold the shift accounting pages. They complement the Shift Sign-up process by documenting it.

Admin interface access

Run 16 shift dues


Dues

Requests to serve additional shifts should be made PRIOR to the final calculation of the dues to the shift committee (D. Keane and D. Smirnov). Please, refer to the important dates section in this document for the "until when" you could make such request.

Past shortfalls, shift coverage by institution

The below table shows the percentage of missed shifts over the past 4 years. This information can be used to exclude authors from the author list in Run 16.

Note: If your institution is in this table and fails again to fulfill its dues and the 4 years average is below the threshold defined by our author exclusion policy (see STAR Note 0545), author would be excluded.

Institution Missed percentage, historical
Frankfurt Institute for Advanced Studies (FIAS) 71%
Institute of Modern Physics, Lanzhou 41%
University of Rajasthan 31%
Pusan National University 25%

Shift sign-up - Run 16

Important dates

  • 2015/12/04 - initial shift dues calculated - council feedback requested.
  • 2015/12/04 - Shift sign-up opens for TESTING purposes only - you may exercise the interface by emulating a sign-up
  • 2015/12/13 - The shift sign-up committee needs all council feedback by 12/11. Final shift dues will then be re-computed and provided (they should not change much and will account for all reported changes by that date)
  • 2015/12/16 - The testing interface will be turned OFF that day and all test records flushed/removed. The countdown will begin.
  • 2015/12/17 - Opening will occur at 10 AM BNL time - please, remember to log prior and wait for the countdown to open the signing

Shift Layout, Period Coordinators and special arrangements

Shift layout

STAR shifts begin January 12, 2016 with cosmic data taking shifts.

Period coordinators

As usual, period coordinators are pre-assigned / pre-signed as selected by the Spokesperson office.

Special arrangements and requests

  • UTA has requested for Lanny Ray (QA coordinator) to have the first QA shift.
    Status: the shift sign-up coordinators have had flexibility for such arrangements.
  • FIAS requested 10 shifts to catch-up for unfilled dues in past years.
  • 2) We've agreed to pre-assign the following QA shifts under the new family-related policy:
    Sevil Salur (LBNL)      FEB 16
    Richard Witt (Yale)     FEB 23
    Juan Romero (UC Davis)  MAY 31
    
    3) Bob Tribble (TAMU) is pre-assigned to a shift during  APR 12-19.
    
    4) To correct an unusual rounding anomaly, we've agreed to subtract one week from Valparaiso U dues.  

  • Dec 19, 2015. Run 16 will be shorter by two week than originally planed: 20 weeks total instead of 22
    Dear STAR Collaborators:
    
    We have just received the guidance from DOE (to BNL) that there will be
    20 cryo-week of RHIC run instead of the originally planned 22 weeks.
    
    Our shift sign-up was designed for 22 weeks. For those who have already
    signed up for the last two weeks, please try to un-sign and help to fill other
    open slots. By now we have 8 open slots and need to un-sign 24.
    
    For those who are not able to re-sign to other spots, we will credit your dues,
    but may ask for help if slots open due to unexpected events (visa etc.).
    
    I am looking forward to a successful run 16 and exciting physics from it.
    
    
    Happy Holidays!
    
    Zhangbu
    

    Below is a screen shot of the last two weeks of original shift schedule Run 16 as of Jan 5, 2016.
    Only those who signed up for shift before Dec 19, 2015 will be eligible for a credit:

    Anju Bhasin, University of Jammu
    Evan Finch, Brookhaven National Laboratory
    Abhinav Sharma, University of Jammu
    Yuri Panebratsev, Joint Institute for Nuclear Research
    Madan Aggarwal, Panjab University
    Isaac Upsal, Ohio State University
    Yang Wu, Kent State University
    Grazyna Odyniec, Lawrence Berkeley National Laboratory
    Kunsu Oh, Pusan National University
    Liwen Wen, University of California - Los Angeles
    Saskia Mioduszewski, Texas A&M University
    Maowu Nie, Shanghai Institute of Applied Physics
    Abhinav Sharma, University of Jammu
    Renee Fatemi, University of Kentucky
    Madan Aggarwal, Panjab University
    Subhash Singha, Kent State University
    Liang He, Purdue University
    Declan Keane, Kent State University
    Sonya Kabana, Kent State University (offline QA)

    The following shifters signed up after the announcement:

    Devika Gunarathne, Temple University
    Amani Kraishan, Temple University

    Before:


    After:

Run 17 shift dues


Dues

Requests to serve additional shifts should be made PRIOR to the final calculation of the dues to the shift committee (D. Keane and D. Smirnov). Please, refer to the important dates section in this document for the "until when" you could make such request.

Past shortfalls, shift coverage by institution

The below table shows the percentage of missed shifts over the past 4 years. This information can be used to exclude authors from the author list in Run 17.

Note: If your institution is in this table and fails again to fulfill its dues and the 4 years average is below the threshold defined by our author exclusion policy (see STAR Note 0545), author would be excluded.

   

Shift sign-up - Run 17

Important dates

  • 2016/11/29 - initial shift dues calculated - council feedback requested.
  • 2016/11/29 - Shift sign-up opens for TESTING purposes only - you may exercise the interface by emulating a sign-up
  • 2016/12/13 - The shift sign-up committee needs all council feedback by 12/11. Final shift dues will then be re-computed and provided (they should not change much and will account for all reported changes by that date)
  • 2016/12/19 - The testing interface will be turned OFF that day and all test records flushed/removed. The countdown will begin.
  • 2016/12/20 - Opening will occur at 10 AM BNL time - please, remember to log prior and wait for the countdown to open the signing

Shift Layout, Period Coordinators and special arrangements

Shift layout

STAR shifts begin January XX, 2017 with cosmic data taking shifts.

Period coordinators

As usual, period coordinators are pre-assigned / pre-signed as selected by the Spokesperson office.
 

Feb. 7-March 7  Oleg Eyser (BNL)
March 7- April 4 Sal Fazio (BNL)
April 4 – April 28 Shuai Yang (BNL)
April 28-May 23 Xiaofeng Luo (CCNU)
May 23 – June 20 Jinlong Zhang (LBL)
June 20 – July 11 Nihar Sahoo (TAMU)

 

Special arrangements and requests


0) Bob Tribble: SL, evening, beginning Mar 21
1) Pavla + Pavol: 5 shifts as below.
2) Juan Romero wants QA for 1 week, beginning May 02.
3) Sevil Salur wants QA for 1 week, beginning Mar 07.
4) Richard Witt wants QA for 1 week, beginning Mar 21.
5) Lanny Ray, as always, is pre-assigned the first QA shift.
6) Jan Rusnak wants QA for 1 week, beginning Apr 04.
7) FIAS wants pre-assigned shifts like last year:
Day, beginning Apr 4: Belousov, 2 weeks of shift crew;
Evening, beginning Apr 4: Pugash, 2 weeks of shift crew;
Day, beginning Apr 4: Vassiliev, 1 week DO trainee + 1 week DO;
Evening, beginning Apr 4: Zyzak, 1 week DO trainee + 1 week DO

 

Run 18 shift dues


Run 18 Shift Dues & Notes


Period coordinators

As usual, period coordinators are pre-assigned, as arranged by the Spokespersons.

Special arrangements and requests

  1. Under the family-related policy, the following 6 weeks of offline QA shifts were pre-assigned:
    MAR 27 Kevin Adkins (Kentucky)
    APR 03 Kevin Adkins
    APR 10 Sevil Salur (Rutgers)
    APR 17 Richard Witt (USNA/Yale)
    MAY 22 Juan Romero (UC Davis)
    JUN 12 Terry Tarnowsky (Michigan State)
     
  2. Lanny Ray (UT Austin), as QA coordinator, always is pre-assigned the first QA week.
     
  3. FIAS remains in “catch-up mode” and is taking extra shifts above their dues. Pre-assigned shifts can be requested in this scenario. FIAS has been pre-assigned 4 Detector Op shifts.
     
  4. Bob Tribble (TAMU) requests the evening Shift leader slot during Apr 10-17.

Run 19 special requests

The following pre-assigned slot requests were made.
    9 WEEKS PRE-ASSIGNED QA AS FOLLOWS
    ==================================
    Lanny Ray (UT Austin) QA Mar 5
    Richard Witt (USNA/Yale) QA Mar 19
    Sevil Salur (Rutgers) QA Apr 16
    Wei Li (Rice) QA Apr 23
    Kevin Adkins (Kentucky) QA May 14
    Juan Romero (UC Davis) QA May 21
    Jana Bielcikova (NPI, Czech Acad of Sci) QA May 28  
    Yanfang Liu (TAMU) QA June 25 
    Yanfang Liu (TAMU) QA July 02
    
    8 WEEKS PRE-ASSIGNED REGULAR SHIFTS AS FOLLOWS
    ==================================
    Bob Tribble (BNL) Feb 05 SL evening 
    Daniel Kincses (Eotvos) Mar 12  DO Trainee Day
    Daniel Kincses (Eotvos) Mar 19  DO Day
    Mate Csanad (Eotvos) Mar 12 SC Day
    Ronald Pinter (Eotvos) Mar 19 SC Day
    Carl Gagliardi (TAMU)  May 14  SL day
    Carl Gagliardi (TAMU)  May 21 SL day 
    Grazyna Odyniec (LBNL) July 02 SL evening
    
    

Shift Dues and Special Requests Run 20

For the calculation of shift dues, there are two considerations.
1) The length of time of the various shift configurations (2 person, 4 person no trainees, 4 person with trainees, plus period coordinators/QA shifts)
2) The percent occupancy of the training shifts

For many years, 2) has hovered about 45%, which is what we used to calculate the dues.  Since STAR gives credit for training shifts (as we should) this needs to be factored in or we would not have enough shifts.

The sum total of shifts needed are then divided by the total number of authors minus authors from Russian institutions who can not come to BNL.

date                  weeks           crew           training           PC           OFFLINE          
11/26-12/10    2                  2                      0                  0           0           
12/10-12/24    2                  4                      2                 1            0   
12/24-6/30      27                4                      2                 1            1   
7/02-7/16        2                  4                      0                 1            1   

Adding these together (3x a shift for crew, 3x45% for training, plus pc plus offline) gives a total of 522 shifts.
The total number of shifters is 303 - 30 Russian collaborators = 273 people
Giving a total due of 1.9 per author.

For a given institution, their load is calculated as # of authors - # of expert credits x due -> Set to an integer value as cutting collaborators into pieces is non-collegial behavior.

However, this year, this should have been:
date                  weeks           crew           training           PC           OFFLINE          
11/26-12/10    2                  2                      0                  0           0           
12/10-12/24    2                  4                      2                 1            0   
12/24-6/02      23                4                      2                 1            1   
6/02-6/16        2                  4                      0                 1            1   

Adding these together (3x a shift for crew, 3x45% for training, plus pc plus offline) gives a total of 456 shifts for a total due of 1.7 per author.

We allowed some people to pre-sign up, due to a couple different reasons.

Family reasons so offline QA:
James Kevin Adkins
Jana BielĨíková
Sevil Selur
Md. Nasim
Yanfang Liu

Additionally, Lanny Ray is given the first QA shift of the year as our experience QA shifter.

This year, to add an incentive to train for shift leader, we allowed people who were doing shift leader training to sign up for both their training shift and their "real" shift early:
Justin Ewigleben
Hanna Zbroszczyk
Jan Vanek
Maria Zurek
Mathew Kelsey
Kun Jiang
Yue-Hang Leung

Both Bob Tribble and Grazyna Odyniec sign up early for a shift leader position in recognition of their schedules and contributions

This year because of the date of Quark Matter and the STAR pre-QM meeting, several people were traveling on Tuesday during the sign up.  These people I signed up early as I did not want to punish some of our most active colleagues for the QM timing:
James Daniel  Brandenburg
Sooraj Radhakrishnan

3 other cases that were allowed to pre-sign up:
Panjab University had a single person who had the visa to enter the US, and had to take all of their shifts prior to the end of their contract in March.  So that the shifter could have some spaces in his shifts for sanity, I signed up:
Jagbir Singh
Eotvos Lorand University stated that travel is complicated for their group, and so it would be good if they could insure that they were all on shift at the same time.  Given that they are coming from Europe I signed up:
Mate Csanad
Daniel Kincses
Roland Pinter
Srikanta Tripathy
Frankfurt Institute for Advanced Studies (FIAS) wanted to be able to bring Masters students to do shift, but given the training requirements and timing with school and travel for Europe, this leaves little availability for shift.  So I signed up:
Iouri Vassiliev
Artemiy Belousov
Grigory Kozlov

Tools

This is to serve as a repository of information about various STAR tools used in experimental operations.

EVO

This section contains information about using EVO for STAR meetings.

If you would like to be able to use EVO in the 1006 trailer, there is a conference PC setup for use.  There is a generic account on the computer for everyone to share.

The account credentials are:
Username: rhicstar
Password: (See below)
Log On To: Conference (This computer)

I will not post the password anywhere that is not encrypted for security purposes, so please come see me in my office (Building 510 Room 1-179) or send me an e-mail containing your GPG public key.  If you do not have a GPG public key, please bring your laptop, (for desktop users, call me, and I'll come to see you) and I'll help you set it up.  It is quite useful.

FUSE & SSHFS - Overview and example in STAR online environment

FUSE (Filesystem in Userspace)


FUSE is a kernel module that acts as a bridge between the kernel’s built-in filesystem functions and user-space code that “understands” the (arbitrary) structure of the mounted content.  It allows non-root users to add filesystems to a running system.

Typically, FUSE-mounted filesystems are (nearly) indistinguishable from any other mounted filesystem to the user.

Some examples of FUSE in action:

  • WikipediaFS - viewing and editing Wikipedia articles as if they are local files.
  • Archive access - accessing and in some cases manipulating files in tarballs, zip archives, cpio archives, etc.
  • Encrypted filesystems
  • Union of filesystems (as is done in many live Linux boot disks and Linux installation routines to merge the read-only CD-rom filesystem with read-write space on disk)
  • Event Triggering - FUSE implementations can have triggered events.  Some possible uses might be:
    • automatically restarting a service if its configuration file is altered
    • automatically re-compiling code whenever a source file is changed
    • making a back-up after a file is changed
  • Arbitrary hardware interface
  • ... and the one we will focus on here:  SSHFS

The Fuse project FileSystems page has a more complete list and links to individual software projects that use FUSE.

 
SSHFS (Secure Shell Filesystem)


SSHFS allows a user (not necessarily root) on host A (the "client") to mount a directory on host B (the "server") using the (almost) ubiquitous SSH client-server communication protocols.  Generally, no configuration changes or software installations are required on host B.

The directory on host B then looks like a local directory on host A, at a location in host A's directory structure chosen by the user (in a location where user A has adequate privileges of course).

Unlike NFS, the user on host A must authenticate as a known user on host B, and the operations performed on the mounted filesystem are performed as known user on host B.  This avoids the "classic" NFS problem of UID/GID clashes between the client and server.

Here is a sample session with some explanatory comments:

In this example, host A is "stargw1" and host B is "staruser01".  The user name is wbetts on both hosts, but the user on host B could be any account that the user can access via SSH.
 
First, create a directory that will serve as the mountpoint:

[wbetts@stargw1 ~]$ mkdir /tmp/wbssh
[wbetts@stargw1 ~]$ ls -ld /tmp/wbssh
drwxrwxr-x  2 wbetts wbetts 4096 Oct 13 10:52 /tmp/wbssh

Second, mount the remote directory using the sshfs command:

[wbetts@stargw1 ~]$ sshfs staruser01.star.bnl.gov: /tmp/wbssh


In this example, no remote username or directory is specified, so the remote username is assumed to match the local username and the user’s home directory is selected by default.  So the command above is equivalent to:

% sshfs wbetts@staruser01.star.bnl.gov:/home/wbetts /tmp/wbssh

That’s it!  (No password or passphrase is required in this case, because wbetts uses SSH key agent forwarding) 

Now use the remote files just like local files:

[wbetts@stargw1 ~]$ ls -l /tmp/wbssh |head -n 3
total 16000
-rw-rw-r--  1 1003 1003    6412 Oct 19  2005 2005_Performance_Self_Appraisal.sxw
-rw-rw-r--  1 1003 1003   10880 Oct 19  2005 60_subnet_PLUS_SUBSYS.sxc
[wbetts@stargw1 ~]$ ls -ld /tmp/wbssh drwx------  1 1003 1003 4096 Oct 11 15:56 /tmp/wbssh


The permissions on our mount point have been altered -- now the remote UID is shown (a source of possible confusion) and the permissions have morphed to the permissions on the remote side, but this is potentially misleading too…

[root@stargw1 ~]# ls /tmp/wbssh
ls: /tmp/wbssh: Permission denied

Even root on the local host can’t access this mount point, though root can see it in the list of mounts.
 
In addition to the ACL confusion, there can be some quirks in behaviour, where sshfs doesn't translate perfectly:

[wbetts@stargw1 ~]$ df /tmp/wbssh
Filesystem                                       1K-blocks       Used     Available        Use%     Mounted on
sshfs#staruser01.star.bnl.gov:    1048576000         0     1048576000       0%     /tmp/wbssh


Ideally the user unmounts it once finished, else it sits there indefinitely (it is probably subject to the same timeouts (TCP, firewall conduit, SSH config, etc.) as an ordinary ssh connection, but in limited testing so far, the connection has been long term)  Here is the unmount command:

[wbetts@stargw1 ~]$ fusermount -u /tmp/wbssh/
[wbetts@stargw1 ~]$ ls /tmp/wbssh
[wbetts@stargw1 ~]$

Some additional details:

By default, users other than the user who initiated the mount are not permitted access to the local mountpoint (not even root), but that can be changed by the user, IF it is permitted by the FUSE configuration (as decided by the admin of the client node).  The options though are not very granular.  The three possible options are:

  1. access for the user who mounted it (and no one else)
  2. the mounter plus root
  3. everybody

In any case, whoever accesses the mount point will act as (and have the permissions of) the user on host B specified by the mounter.  This requires careful evaluation of the options permitted and user education on the possibilities of allowing inappropriate or unnecessary access to other users.

The mount is not tied to the specific shell it is started in.  It lasts indefinitely it seems – the user can log out of host A, kill remote agents, etc. and the mount remains accessible on future logins.  (Interpretation: an agent of some sort is maintained on the client (host A) on the user’s behalf.  (If multiple users have access to the user account on A, this could be worrisome, in the same manner as the allowance of others to access the mount point mentioned above.)) 

 

Here are some potential advantages and benefits of using SSHFS, some of which are mentioned above:

  • User-initiated
  • Encrypted communications over the network
  • Authenticated (at first order) – somewhat better user tracing than NFS
  • SSH keys/forwarding can make it relatively painless (no pass{words,phrases} required for mounting)
  • Networking/firewalling is simple – if ssh works between the two nodes, then so will sshfs (unlike NFS, where port configuration and firewalls are a pain)
  • “Passthrough” mounting works -- an sshfs mount point can be mounted from another node (if host B mounts a directory on C, then A can mount B's mountpoint and have access to C's filesystem.  In this case, B acts as both a client (to C) and a server (to A).)
  • No server-side configuration is needed.
  • These mounts can be automounted by the user somewhat like autofs using afuser ( http://afuse.sourceforge.net/ ), though this is primarily for interactive use based on SSH agents.

 

And some drawbacks:

  • User initiated (they are unlikely to clean up after themselves)
  • Access controls are either very strict (by default), or very lax in the hands of users (-o allow_other or -o allow_root) -- nothing else
  • Cross-system UID overlap and ACLs can be confusing
  • Availability of FUSE for RHEL/SL 3 and other clients?
  • Use of SSHFS in scripts could entice users to create SSH keys without passphrases -- a real no-no!

And some final details about the configuration of the online gatekeepers that presumably are prime candidates for the use of SSHFS:

The standard installation of FUSE for Scientific Linux 4 seems to not be quite complete.  A little help is required to make it work:

In /etc/rc.d/rc.local:

/etc/init.d/fuse start
/bin/chown root.fuse /dev/fuse
/bin/chmod 660 /dev/fuse


“fuse” group created – each user who will use SSHFS needs to be a member of this group (must be kept in mind if we use NIS or LDAP for user management on the gateways)

 

Server Logging

The default openssh packages  from Scientific Linux 3, 4 and 5 (~openssh 3.6, 3.9 and 4.3 respectively) do not support sftp-subsystem logging.  Later versions of openssh do (starting at version ~4.4).  This provides the ability to log file accesses and trace them to individual (authenticated) users. 

I grabbed the latest openssh source (version 5.1) and built it on an SL4 machine with no trouble:

% ./configure --prefix=/opt/openssh5.1p1 --without-zlib-version-check --with-tcp-wrappers
% make
% make install

 

Then in the sshd_config file, append "-f AUTHPRIV -l INFO" to sftp-subsystem line.  This activates the logging level (INFO) and causes the logs to be sent to /var/log/secure.  (To be tried: VERBOSE log level).

Even at the INFO level, the logs are fairly detailed.  Shown below is a sample session, with the client commands on the left and the resulting log entries from the server (carradine, using port 2222 for testing) on the right.  For brevity, the time stamps from the log have been removed after the first entry.

 

SFTP LOGGING at the INFO level
CLIENT COMMANDS SERVER LOG (/var/log/secure)
   
sshfs -p 2222 wbetts@carradine.star.bnl.gov:/home/wbetts/ carradine_home Nov 20 14:30:29 carradine sshd[29120]: Accepted publickey for wbetts from 130.199.60.84 port 41746 ssh2
carradine sshd[29122]: subsystem request for sftp
carradine sftp-server[29123]: session opened for local user wbetts from [130.199.60.84]
ls carradine_home carradine sftp-server[29123]: opendir "/home/wbetts/."
carradine sftp-server[29123]: closedir "/home/wbetts/."
touch carradine_home/test.txt carradine sftp-server[29123]: sent status No such file
carradine sftp-server[29123]: open "/home/wbetts/test.txt" flags WRITE,CREATE,EXCL mode 0100664
carradine sftp-server[29123]: close "/home/wbetts/test.txt" bytes read 0 written 0
carradine sftp-server[29123]: open "/home/wbetts/test.txt" flags WRITE mode 00
carradine sftp-server[29123]: close "/home/wbetts/test.txt" bytes read 0 written 0
carradine sftp-server[29123]: set "/home/wbetts/test.txt" modtime 20081120-14:36:36
cat /etc/DOE_banner >> carradine_home/test.txt carradine sftp-server[29123]: open "/home/wbetts/test.txt" flags WRITE mode 00
carradine sftp-server[29123]: close "/home/wbetts/test.txt" bytes read 0 written 1119
rm carradine_home/test.txt carradine sftp-server[29123]: remove name "/home/wbetts/test.txt"
fusermount -u carradine_home/ carradine sftp-server[29123]: session closed for local user wbetts from [130.199.60.84]

 

From these logs, we would appear to have a good record of the who/what/when of sshfs usage.  But the need to build our own openssh packages puts a burden on us to track and install updated openssh versions in a timely fashion, rather than relying on the distribution maintainer and the OS's native update manager(s).  The log files on a heavily utilised server may also become unwieldy and cause a performance degredation, but I've not made any estimates or tests of these issues.

 



Here are the specific relevant packages installed on the client test nodes (stargw1 and stargw2):


fuse-2.7.3-1.SL
fuse-libs-2.7.3-1.SL
fuse-devel-2.7.3-1.SL
fuse-sshfs-2.1-1.SL
kernel-module-fuse-2.6.9-78.0.1.ELsmp-2.7.3-1.SL

(Exact versions should not be terribly important, but it appears that fuse-2.5.3 included up to SL4.6 requires more tweaking after installation than fuse 2.7.3 included in SL4.7).

 

 

Implementing SSL (https) in Tomcat using CA generated certificates

The reason for using a certificate from a CA as opposed to a self-signed  certificate is that the browser gives a warning screen and asks you to except the certificate in the case of a self-signed  certificate. As there already exists a given list of trusted CAs in the browser this step is not needed.
 
The following list of certificates and a key are needed:

/etc/pki/tls/certs/wildcard.star.bnl.gov.Nov.2012.cert – host cert.
/etc/pki/tls/private/wildcard.star.bnl.gov.Nov.2012.key – host key (don’t give this one out)
/etc/pki/tls/certs/GlobalSignIntermediate.crt – intermediate cert.
/etc/pki/tls/certs/GlobalSignRootCA_ExtendedSSL.crt –root cert.
/etc/pki/tls/certs/ca-bundle.crt – a big list of many cert.

Concatenate the following certs into one file in this example I call it: Global_plus_Intermediate.crt
cat /etc/pki/tls/certs/GlobalSignIntermediate.crt > Global_plus_Intermediate.crt
cat /etc/pki/tls/certs/GlobalSignRootCA_ExtendedSSL.crt >> Global_plus_Intermediate.crt
cat /etc/pki/tls/certs/ca-bundle.crt >> Global_plus_Intermediate.crt

Run this command. Note that -name tomcat” and -caname root should not be changed to any other value. The command will still work but will fail under tomcat. If it works you will be asked for a password, that password should be set to "changeit".

 openssl pkcs12 -export -in wildcard.star.bnl.gov.Nov.2012.cert -inkey wildcard.star.bnl.gov.Nov.2012.key -out mycert.p12 -name tomcat -CAfile Global_plus_Intermediate.crt -caname root -chain

Test the new p12 output file with this command:

keytool -list -v -storetype pkcs12 -keystore mycert.p12

Note it should say: "Certificate chain length: 3"


In tomcat’s the server.xml file add a connector that looks like this:
 

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="150" scheme="https" secure="true"
           keystoreFile="/home/lbhajdu/certs/mycert.p12" keystorePass="changeit"
           keystoreType="PKCS12" clientAuth="false" sslProtocol="TLS"/>


Note the path should be set to the correct path of the certificate.  And the p12 file should only be readable by the Tomcat account because it holds the host key. 

Online Linux pool

March 15, 2012:

THIS PAGE IS OBSOLETE!  It was written as a guide in 2008 for documenting improvements in the online Linux pool, but has not been updated to reflect additional changes to the state of the pool, so not all details are up to date. 

One particular detail to be aware of:  the name of the pool nodes is now onlNN.starp.bnl.gov, where 01<=NN<=14.  The "onllinuxN" names were retired several years ago.

 

Historical page (circa 2008/9):

Online Linux pool for general experiment support needs

 

GOAL: 

Provide a Linux environment for general computing needs in support of the experiemental operations.

HISTORY (as of approximately June 2008):

A pool of 14 nodes, consisting of four different hardware classes (all circa 2001) has been in existence for several years.  For the last three (or more?) years, they have had Scientific Linux 3.x with support for the STAR software environment, along with access to various DAQ and Trigger data sources.  The number of significant users has probably been less than 20, with the heaviest usage related to L2.  User authentication was originally based on an antique NIS server, to which we had imported the RCF accounts and passwords.  Though still alive, we have not kept this NIS information maintained over time.  Over time, local accounts on each node became the norm, though of course this is rather tedious.  Home directories come in three categories:  AFS, NFS on onllinux5, and local home directories on individual nodes.  Again, this gets rather tedious to maintain over time.

There are several "special" nodes to be aware of:

  1. Three of the nodes (onllinux1, 2 and 3) are in the Control Room for direct console login as needed.  (The rest are in the DAQ room.)
  2. onllinux5 has the NFS shared home directories (in /online/users).  (NB.  /online/users is being backed up by the ITD Networker backup system.)
  3. onllinux6 is (was?) used for many online database maintenance scripts (check with Mike DePhillps about this -- we had planned to move these scripts to onldb).
  4. onllinux1 was configured as an NIS slave server, in case the NIS master (starnis01) fails.

 

PLAN:

For the run starting in 2008 (2009?), we are replacing all of these nodes with newer hardware.

The basic hardware specs for the replacement nodes are:

Dual 2.4 GHZ Intel Xeon processors

1GB RAM

2 x 120 GB IDE disks

 

These nodes should be configured with Scientific Linux 4.5 (or 4.6 if we can ensure compatibility with STAR software) and support the STAR software environment.

They should have access to various DAQ and Trigger NFS shares.  Here is a starter list of mounts:

 

Shared DAQ and Trigger resources

SERVER DIRECTORY on SERVER LOCAL MOUNT PONT MOUNT OPTIONS
 evp.starp  /a  /evp/a  ro
 evb01.starp  /a  /evb01/a  ro
 evb01  /b  /evb01/b  ro
 evb01  /c  /evb01/c  ro
 evb01  /d  /evb01/d  ro
 evb02.starp  /a  /evb02/a  ro
 evb02  /b  /evb02/b  ro
 evb02  /c  /evb02/c  ro
 evb02  /d  /evb02/d  ro
 daqman.starp  /RTS  /daq/RTS  ro
 daqman  /data  /daq/data  rw
 daqman  /log  /daq/log  ro
 trgscratch.starp  /data/trgdata  /trg/trgdata  ro
 trgscratch.starp  /data/scalerdata  /trg/scalerdata  ro
 startrg2.starp  /home/startrg/trg/monitor/run9/scalers  /trg/scalermonitor  ro
 online.star  /export  /onlineweb/www  rw

 

 

WISHLIST Items with good progress:

  • <Uniform and easy to maintain user authentication system to replace the current NIS and local account mess.  Either a local LDAP, or a glom onto RCF LDAP seems most feasible> -- An ldap server (onlldap.starp.bnl.gov) has been set-up and the 15 onllinux nodes are authenticating to it *BUT* it is using NIS!
  • <Shared home directories across the nodes with backups> -- onlldap is also hosting the home directories and sharing them via NFS.  EMC Networker is backing up the home directories and Matt A. is recieving the email notifications.
  • <Integration into SSH key management system (mechanism depends upon user authentication method(s) selected).> --  The ldap server has been added to the STAR SSH key management system, and users are able to login to the new onlXX nodes with keys now.
  • <Common configuration management system> -- Webmin is in use.
  • <Ganglia monitoring of the nodes> -- I think this is done...
  • <Osiris monitoring of the nodes> -- I think this is done - Matt A. and Wayne B. are receiveing the notices...

WISHLIST Items still needing significant work:

  • None?

 

SSH Key Management

Overview 

An SSH public key management system has been developed for STAR (see D. Arkhipkin et al 2008 J. Phys.: Conf. Ser. 119 072005), with two primary goals stemming from the heightened cyber-security scrutiny at BNL:

  • Use of two-factor authentication for remote logins
  • Identification and management of remote users accessing our nodes (in particular, the users of "group" accounts which are not tied to one individual) and achieve accountability

A benefit for users also can be seen in the reduction in the number of passwords to remember and type.

 

In purpose, this system is similar to the RCF's key management system, but is somewhat more powerful because of its flexibility in the association of hosts (client systems), user accounts on those clients, and self-service key installation requests.

Here is a typical scenario of the system usage: 

  1. A sysadmin of a machine named FOO creates a user account named "JDOE" and, if not done already, installs the key_services client.
  2. A user account 'JDOE' on host 'FOO' is configured in the Key Management system by a key management administrator.
  3. John Doe uploads (via the web) his or her public ssh key (in openssh format).
  4. John Doe requests (via the web) that his key be added to JDOE's authorized_keys file on FOO.
  5. A key management administrator approves the request, and the key_services client places the key in ~JDOE/.ssh/authorized_keys.

At this point, John Doe has key-based access to JDOE@FOO.  Simple enough?  But wait, there's more!  Now John Doe realizes that he also needs access to the group account named "operator" on host BAR.  Since his key is already in the key management system he has only to request that his key be added to operator@BAR, and voila (subject to administrator approval), he can now login with his key to both JDOE@FOO and operator@BAR.  And if Mr. Doe should leave STAR, then an administrator simply removes him from the system and his keys are removed from both hosts.

Slightly Deeper...

There are three things to keep track of here -- people (and their SSH keys of course), host (client) systems, and user accounts on those hosts:

People want access to specific user accounts at specific hosts.

So the system maintains a list of user accounts for each host system, and a list of people associated with each user account at each host.
(To be clear -- the system does not have any automatic user account detection mechanism at this time -- each desired "user account@host" association has to be added "by hand" by an administrator.)

This Key Management system, as seen by the users (and admins), consists simply of users' web browsers (with https for encryption) and some PHP code on a web server (which we'll call "starkeyw") which inserts uploaded keys and user requests (and administrator's commands) to a backend database (which could be on a different node from the web server if desired). 

Behind the scenes, each host that is participating in the system has a keyservices client installed that runs as a system service.  The keyservices_client periodically (at five minute intervals by default) interacts a different web server (serving different PHP code that we'll call starkeyd).  The backend database is consulted for the list of approved associations and the appropriate keys are downloaded by the client and added to the authorized_keys files accordingly.

In our case, our primary web server at www.star.bnl.gov hosts all the STAR Key Manager (SKM) services (starkeyw and starkeyd via Apache, and a MySQL database), but they could each be on separate servers if desired.

Perhaps a picture will help.  See below for a link to an image labelled "SKMS in pictures".

Deployment Status and Future Plans

We have begun using the Key Management system with several nodes and are seeking to add more (currently on a voluntary basis).  Only RHEL 3/4/5 and Scientific Linux 3/4/5 with i386 and x86_64 kernels have been tested, but there is no reason to believe that the client couldn't be built on other Linux distributions or even Solaris.  We do not anticipate "forcing" this tool onto any detector sub-systems during the 2007 RHIC run, but we do expect it (or something similar) to become mandatory before any future runs.  Please contact one of the admins (Wayne Betts, Jerome Lauret or Mike Dephillips) if you'd like to volunteer or have any questions.

User access is currently based on RCF Kerberos authentication, but may be extended to additional authentication methods (eg., BNL LDAP) if the need arises.

Client RPMs (for some configurations) and SRPM's are available, and some installation details are available here: 

http://www.star.bnl.gov/~dmitry/skd_setup/

An additional related project is the possible implementation of a STAR ssh gateway system (while disallowing direct login to any of our nodes online) - in effect acting much like the current ssh gateway systems role in the SDCC.  Though we have an intended gateway node online (stargw1.starp.bnl.gov, with a spare on hand as well), it's use is not currently required.

 

Anxious to get started? 

Here you go: https://www.star.bnl.gov/starkeyw/ 

You can use your RCF username and Kerberos password to enter.

When uploading keys, use your SSH public keys - they need to be in OpenSSH format. If not, please consult SSH Keys and login to the SDCC.

 
 

STAR Electronic Shiftlog (ESL) Administrator Manual

STAR Electronic Shiftlog (ESL) Administration guide

The STAR (ESL) Electronic Shiftlog is written in JSP (Java server pages) and requires a web server that can render JSP content. Unlike php JSP is compiled into JAVA classes using a method call “Just in Time” this means the page is compiled the first time the page is accessed, then it does not have to be compiled again for the life of the page or until the page is modified. The forbearer of JSP is serverlets these are also used in the shiftlog mostly to stream images. The technology differs in that serverlets need to be compiled in advance of being deployed.

 

Our JSP server is Apache Tomcat. Documentation and newer versions can be downloaded from http://tomcat.apache.org/. Although tomcat is a fully functional web server unto its self we prefer to allow the Apache web server to serve the HTML content and only require Tomcat to serve the JSP pages that Apache can not. This is accomplished by way of the mod_jk Apache Tomcat Connector using the ajp13 protocol. Tomcat hosts on port 8080. This is blocked from the outside but can be seen on a browser started up on the online web server its self.

 

The Tomcat server hosting the shiftlog is deployed on the online web server online.star.bnl.gov and run under the tomcat account. In order to log on to the online web server to administrate Tomcat and the ESL you will need keys mapped to the Tomcat user account. Please see Wayne Betts or Jérôme Lauret about getting your keys mapped. There are multiple version of Tomcat residing in /opt.

 

Conventions relating to install of newer versions of Tomcat on the online web server

All versions of tomcat are placed in the /opt folder, in a sub folder clearly demoting the version number. (When you unzip Tomcat this is usually how it comes.) Examples are:

/opt/apache-tomcat-5.5.20/
/opt/apache-tomcat-6.0.18/


The currently used version of Tomcat is link to /opt/tomcat/. Below is an ls of the tomcat folder:

-bash-3.00$ ls -l /opt/tomcat
lrwxrwxrwx 1 root root 22 Nov 17 11:11 /opt/tomcat -> ./apache-tomcat-6.0.18

Note that this folder is the tomcat’s users home directory. It contains the .ssh folder which holds your keys, so relinking this may cause you to become locked out if you do not transfer this folder in advance.

Configuring Tomcat & The Tomcat Directory Structure

After you install a new version of Tomcat you will want to configure it.

There are some environment variables whose existences you will want to verify, and if they don’t exist you will want to set them, preferably in a start-up script so they will survive a server restart.

$CATALINA_HOME: /opt/tomcat
$JAVA_HOME: /usr/java/default

Inside the Tomcat folder you will find these directories (and some others):

$CATALINA_HOME/bin/
$CATALINA_HOME/logs/
$CATALINA_HOME/webapps/
$CATALINA_HOME/conf/

$CATALINA_HOME/bin/ holds the executables (for linux and windows).

To startup the Tomcat server use:

% $CATALINA_HOME/bin/startup.sh

To shut it down use:

% $CATALINA_HOME/bin/shutdown.sh

You will want to modify the $CATALINA_HOME/bin/catalina.sh this is a script called by startup.sh its function is to invoke the java process which is the Tomcat server.

Directly under the header these lines are added:

# added by Levente Hajdu ##################################### "
export JAVA_OPTS=$JAVA_OPTS" -Xmx512M -Djava.library.path=/usr/lib64 -Djava.awt.headless=true"
############################################################# 

A description of the options used follows

  • -Xmx512M sets the memory ceiling on the JAVA VM which runs the server to 512MB this should be sufficient for our needs. Any more consumption over this limit will lead to the Tomcat process being terminated.

  • -Djava.library.path this sets the library path for an optional set of native (non-JAVA) libraries which Tomcat can utilize for improved performance. If this is not present you will see suggestions to set it in the tomcat log.

  • Djava.awt.headless=true this line prevents a particular type of crash. This server also hosts the SUMS statistics pages. These use libraries (jFreeChart) to render images for display which have a relation to x-server libraries. If Tomcat is started by a user that has X-forwarding enabled but no server running, Tomcat would crash as it tries to execute the JSP without this line present.

You will be spending a lot of time in $CATALINA_HOME/conf/. The file that controls the Tomcat context paths is $CATALINA_HOME/conf/server.xml. This file requires editing when ever software is deployed at a new context path. Before you edit this file always make a backup. Each year of the shiftlog resides on a different context path. Here is the list:

http://online.star.bnl.gov/apps/shiftLog2003/
http://online.star.bnl.gov/apps/shiftLog2004/
http://online.star.bnl.gov/apps/shiftLog2005/
http://online.star.bnl.gov/apps/shiftLog2006/
http://online.star.bnl.gov/apps/shiftLog2007/
http://online.star.bnl.gov/apps/shiftLog2008/
http://online.star.bnl.gov/apps/shiftLog2009/


The current year is always at:

http://online.star.bnl.gov/apps/shiftLog/


If we look inside the $CATALINA_HOME/conf/server.xml file we will see an entry for each one of these paths:

<!--Shiftlog 2007-->
<Context className="org.apache.catalina.core.StandardContext" cachingAllowed="true" 
 charsetMapperClass="org.apache.catalina.util.CharsetMapper" cookies="true" crossContext="false" debug="0" 
 docBase="/var/tomcat/webapps/shiftLog2007.war" mapperClass="org.apache.catalina.core.StandardContextMapper" 
 path="/apps/shiftLog2007" privileged="false" reloadable="true" swallowOutput="false" useNaming="true" 
 wrapperClass="org.apache.catalina.core.StandardWrapper">
<Environment description="" name="year" override="false" type="java.lang.Integer" value="2007"/>
<Environment description="" name="isEditable" override="false" type="java.lang.Boolean" value="false"/>
<Environment description="" name="runLogLink" override="false" type="java.lang.String" 
 value="http://online.star.bnl.gov/RunLog/Summary.php?run="/>
<Environment description="" name="runNumber" override="false" type="java.lang.Integer" value="7"/>
</Context>

This is the block of XML for the shiftlog for 2007. With different versions of Tomcat the syntax of this file can change, however it usually doesn’t change too much. Lets go over the important properties in this block:

docBase – Tomcat supports web archive files (.war). This is basically a zip file with a special internal structure. The explanation of the preparation of one of these files would take a whole Drupal page unto its self.

Path – This is the context path at which the site will appear when you look at it over your web browser. It is the part of the url after the server name.

Environment – The environment sub-tag makes information available to the program. The format if fairly simple, However you have to be careful to set the override="false" or else the .war files ./WEB-INF/web.xml will over write these values with its own values.

The environment properties for the shiftlog are:

year – this is the shiftlog year. Example: “2007”

isEditable – this is a boolean value after the run has completed access to the editor is turned off by setting this to false.

runLogLink – This is the url for the run log. The shiftlog uses this to build links to the run log.

runNumber – this is almost the same as the year it’s just the number. Examples:

run 8 = 2008

run 9 = 2009

run 10 = 2010

The $CATALINA_HOME/webapps/ web apps folder holds the default pages that come pre-packaged with the Tomcat server. This is also the location where Tomcat unpacks the war files. The folder naming conventions can change from Tomcat version to Tomcat version.

The $CATALINA_HOME/logs/ directory, as you may have guessed, holds log files. You will want to look over all files in here even if Tomcat would seem to be functioning correctly. The logs can point out errors you many not be aware of. The file $CATALINA_HOME/webapps/catalina.out holds the stander output stream of your JSPs (not to be confused with the HTML output stream) along with Tomcats own stander output stream, making this a handy file for debugging.

Deploying new war files

To deploy a war file the procedure is as follows:

  1. Stop Tomcat:

    $CATALINA_HOME/bin/shutdown.sh

    NOTE: If you deploy the tomcat administrative web interface shutting down the whole server is not strictly required because you could just shut down the context path, but I prefer to shut down the whole server as a matter of habit because time required is so short no one really notices.
     

  2. If this is an upgrade of an existing .war file (else move to step 3), back up the old .war file. All war files are located in /var/tomcat/webapps/ here is the listing of the directory, note the convention for the naming of the web archive files:

    -bash-3.00$ ls -1 /var/tomcat/webapps/shiftLog*.war
    /var/tomcat/webapps/shiftLog2003.war
    /var/tomcat/webapps/shiftLog2004.war
    /var/tomcat/webapps/shiftLog2005.war
    /var/tomcat/webapps/shiftLog2006.war
    /var/tomcat/webapps/shiftLog2007.war
    /var/tomcat/webapps/shiftLog2008t.war
    /var/tomcat/webapps/shiftLog2008.war
    /var/tomcat/webapps/shiftLog2009.war

    When removing one of these files I move it to the /var/tomcat/webapps/old/ directory and rename it following the convention here:

    shiftLog2007.Apr03.965628000.war
    shiftLog2007.Apr04.288184000.war
    shiftLog2007.Apr09.200079000.war
    shiftLog2007.Dec03.805483000.war
    shiftLog2007.Feb07.785336000.war
    ...
    shiftLog2007.Mar27.875569000.war
    shiftLog2007.Nov09.134343000.war
    shiftLog2007.Nov28.320967000.war
    shiftLog2007.Nov28.657299000.war

    It is important to retain the backup in case there is something wrong with the new .war file, keeping the old one will allow you to roll back whilst the problem is being corrected.

  3. Next copy over the new .war file from the node on which it resides. Scp is the method I use for this. The syntax is:

    % scp [username]@[nodeName]:[Path&File]/var/tomcat/webapps/shiftLog[year].war
  4. If this is a new deploy and not an upgrade of an existing .war file you will have to configure a context path in $CATALINA_HOME/conf/server.xml (else move to step 6)
     

  5. If this is an upgrade you will have to dump (delete) the expanded .war file in $CATALINA_HOME/webapps/ it should be a directory having a name similar to that of the name of the .war file. You do not have to back this up because you already have the .war file backed up.

  6. Startup Tomcat

    % $CATALINA_HOME/bin/startup.sh
  7. Open up a web browser and check that the page displays correctly

  8. Run the shift log Java web start application to confirm that the developer has signed his or her jar files within the .war file, if not you will need to have the .war file rebuilt.

Tips

Because upgrades are done fairly frequently mostly for request for new features and some bug fixes I keep a script to do the upgrade process listed above, however the script requires modification before running it. The name of the script is $CATALINA_HOME/bin/deploy_year .

If you have done the upgrade but do not notice any change:

  1. checked that you dumped $CATALINA_HOME/webapps/ (step 5)

  2. also dump your web browsers cache

If you get the “page unavailable” message, check that the tomcat process is running. Use the command

ps –ef | grep tomcat | grep java 

Even if it is running shut it down and try and restart it again, like an old car Tomcat may not start the first time you try to crank it over.


 

Adding a user account to the ShiftLog Expert Online Remote Editor

STAR experts deemed absolutely essential may request an account to edit the ShiftLog directly via the web interface. Each user gets a unique account and Tomcat manages the session. The negotiation of the password should be done as securely as possible (phone or encrypted e-mail).  

 

In order to create an account log into the online web server as the user tomcat from an internal computer. You need rights to do this.

   ssh tomcat@online.star.bnl.gov

Edit the file  $CATALINA_HOME/conf/tomcat-users.xml

Note: that $CATALINA_HOME may not be defined. However it is wherever Tomcat is installed. In our case this /opt/tomcat   

The file looks like this:

<tomcat-users>
  <role rolename="manager"/>
  <role rolename="logEditor"/>
  <user username="jfaustus" password="24y&damn'd" roles="logEditor"/>
  <user username="mephistophilis" password="tee*sMovClkStrk" roles="logEditor"/>
</tomcat-users>

Add a new user with the username and password set as agreed with the user and the roles set to "logEditor.

The restart server:

$CATALINA_HOME/bin/shutdown.sh
$CATALINA_HOME/bin/startup.sh

Check that it works and you’re done.


UPS list

Uninterruptible Power Supplies at the experiment:

 

RackTables OBJECT NAME LOCATION           MODEL                                   BATTERY TYPE LAST BATTERY  CHANGE  DEVICES POWERED                    NOTES
             
  Control Room, Slow Controls Terminals, floor

APC DLA1500

(Smart-UPS 1500)

RBC7 12/01/2014

12/16/2020
sc2.starp.bnl.gov

sc3.starp.bnl.gov

2 LCDs for sc2 & sc3

Serial #: AS0736230443
Manuf. date: Sept. 2007

black

UPS7 Control Room, Slow Controls Terminals, floor near south west corner APC SMT1500NC RBC7 6/2017 (original battery)

sc5.starp.bnl.gov

2 LCDs for sc5

speakers for sc5

Serial #: AS1711333192
Manuf. date: March 2017

black tower

has an AP9631 network interface with an environmental monitor probe (not configured yet)

  Control Room, north of Slow Controls Terminals, floor APC BR1000G RBC123

12/2017

sc.starp.bnl.gov + LCD

speakers?

Serial #: 3B1204X18919
Manuf. date: January 2012
bought summer 2012

black "tower"

no self-test from front panel

  Control Room, TPC Terminals, console shelf

 APC SMT1500RM2U

(Smart-UPS 1500)

RBC133
11/2014 (orig. battery put into service)


10/25/2019
 

chaplin + 2 LCDs

sirius + LCD

Serial#: AS1431232892
Manuf. date: July 2014

Rack-mount

  Control Room,
TPC terminals

APC DLA1500

(SMART-UPS 1500)

RBC7 11/08/2018 gmt-ops + LCD

Serial # AS0736230401
Manuf. date: Sept. 2007

Black

 

Control Room, trigger systems, countertop

 APC SUA1500

(Smart-UPS 1500)

 RBC7 3/22/2015
10/16/2019

startrg
 

Serial #: AS0628221767
Manuf. date: July 2006

black

  Contol Room, magnet terminals, behind the LCD for the Windows PC running magnet monitoring

 APC BR1500LCD

(Back-UPS RS 1500)

RBC109  1/24/2020 rosas + LCD

Serial #: 3B0935X21952
Manuf. date: August 2009

gray/black

nominally belongs to CAD, possible contacts are John Pomaro or anyone in Collider-Accelerator Support

  Control Room, under Shift Leader desk APC BR1000G RBC123 11/30/2017 (though battery was bought in November 2014) shift-leader + 2 LCD

Serial #: 3B1204X18994
Manuf. date: January, 2012

black

PowerChute Personal edition installed on shift-leader system (cannot be used with PowerChute Business Edition)

no self-test from front panel

             
UPS1  DAQ Room, L4 and server rack (center row, north end) APC SMT1500RM2U
(Smart-UPS 1500)
RBC133 March 2018 (though battery was bought in November 2015) ovirt2

onldb5 (twice, redundant PS)

onlldap

onlam3
Serial #: AS1231125008
Manuf. date: July 2012
black, rack mount

bought December 2012
 
 

DAQ Room, on the floor between DB1 and DB2 (the legacy DAQ and trigger racks - southern end of the middle row)

 APC SMT1500

(Smart-UPS 1500)

 RBC7  Original
Battery (3/2011)

February 2015
evp3 (bottom PS),

trgscratch (top PS),

sclrscratch (top PS),

daqlocalmain network switch,

daq-sw2 network switch

trgscratch 12 disk external storage array (bottom PS)
 

Serial #: AS1050221151
Manuf. date: Dec. 2010

black

bought March 2011

   DAQ Room, east floor

 APC SMT1000

(Smart-UPS 1000)

 RBC6 ~8/1/2011

stargw3.starp.bnl.gov

Serial # AS1114210533
Manuf. date: April 2011

black

UPS2 DAQ Room, rack DB2 (legacy DAQ rack)
 APC SMT2200RM2U

(Smart-UPS 2200)
 
 RBC43  November 2014 (original battery)

November 26, 2019

evp

trgscratch (bottom PS)

sclrscratch (bottom PS)

trgscratch 12 disk external storage array (top PS)

Serial #: AS1431243644
Manuf. date: July 29, 2014

rack-mount
 
  DAQ Room floor north of shelves in center row  APC SUA2200  RBC55 Jan. 28, 2013

July 17, 2017

onldb4 (right PS)

onldb3 (right PS)

satabeast1 (right PS)

Serial #: XS05420002612
Manuf. date: Oct. 2005

black

UPS3 DAQ Room DC3

APC SUA1500RM2U

(Smart-UPS 1500)

 RBC24 Oct. 11, 2016

Dec. 6, 2019
barbados2

softioc4

daq-sw1

Serial #: AS0847123095
Manuf. date: Nov. 2008

black, rack-mount

  DAQ Room DC4 APC SMX1500RM2U with

APC SMX48RMBP2U (external battery)
RBC115

2x RBC115
? various SGIS interlock equipment Serial #: AS1039230480

C-AD equipment
UPS4 DAQ Room, rack DB1, bottom APC SMT2200RM2U  RBC43 July 2019

l2ana01 (bottom PS)

l2ana02

PCI extension (for l2ana01)

Serial #: AS1336140512
Manuf. date: Sept. 2013
UPS5 DAQ Room, northern Online Linux Pool rack APC SMT2200RM2U RBC43 November 2014 (original battery)

January 23, 2018

June 7, 2019
onl30,

onldb (x2),

mongodev04,

mongodev05,

mongodev06
Serial #: AS1430241567
Manuf. date: July 22, 2014

<DAQ Room Power Panel>
UPS6 DAQ Room, L4 and server rack (middle row, north end)

APC DLA1500RM2U

(Smart-UPS 1500)

 RBC24  May 27, 2016

 July 9, 2019

 L4 network switch,

 dbbak (both PS)

 dashboard1 (both PS)


 

Serial #: AS0340212578
Manuf. date: Sept. 2003

black, rack-mount

Was behaving oddly at one point (estimated zero runtime and wouldn't recalibrate), but after removing the load and recalibrating, seems to be working fine, and calibration routine works

UPS9 DAQ Room middle row shelves APC SMT2200RM2UTW
(Smart-UPS 2200)
RBC43 (Note that the unit itself says it uses an RBC55, if one navigates through the onboard menu.  This appears to be an error on the part of APC). 05/2017
(original battery)


12/2019
satabeast3 (left PS)

onldb3.starp (left PS)

onldb4.starp (right PS)
 
Serial #: AS1645262798
Manuf. date: November 2016
UPS10 DAQ Room, middle row shelves (middle shelf) APC SMC1500-2U  RBC132  12/2015 stardns1.starp.bnl.gov

24-disk SAS enclosure for  trgscratch and sclrscratch

onldb2.starp (left PS)
Serial #: AS1539124741

Bought December 2015

to initiate self-test: push + hold Mute,then press Display for 2 seconds
  DAQ Room,
bottom of rack "DB9" (center row, north end)
APC SMT2200RM2U  RBC43 5/2017 (original battery, installed at factory 11/2016) cephmon01 in rack DB8 (right PS)

cephmon02 in rack DB8 (right PS)

onldb6 in rack DB8

onlpool-s60-01 and onlpool-s60-02 (via a shared extension cord)
Serial #:AS1645260493
Manuf. date: November 3, 2016 (bought May 2017)

2U rack-mount
 
  DAQ Room, NW corner networking rack

APC Smart-UPS RT 6000

 RBC43 second half of 2013? Various networking equipment

Serial #: ?

rack-mount

*belongs to ITD*

UPS11 DAQ Room,
table near Control Room window

(available spare - but see note in far right column)
 
APC SMT2200RM2U  RBC43 12/2014 (original battery, installed at factory 8/2014) nothing
Serial #:AS1435142781
Manuf. date: July 28, 2014 (bought December 2014)

rack mount

Has overheated and shutdown while in service in the DAQ Room during AC failures (with ambient room temperatures above 90 F (reaching 100 at times)).  So while it seems to be an otherwise reliable unit, it should not be used in an environment where the temperatures may have such uncomfortably high temperatures, nor in the immediate vicinity of other especially warm equipment.
 
             
   WAH 1A9 APC SMT1500RM2U (Smart-UPS 1500) RBC133 11/2018

NPSlaser.starp (Remote power switch for TPC laser PC, though the PC is NOT plugged into it, only a "picomotor multi-axis driver")

Serial #: AS1146110083
Manuf. date: Nov. 2011

rack-mount, black

   WAH 1B1 APC SMT1500RM2U (Smart-UPS 1500) RBC133 10/2016 (original battery) tofcontrol

TOF USB hub

Serial #: AS1617143314
Manuf. date: April 2016, bought September 2016

rack-mount, black

  WAH 1C4 APC SMT1500RM2U RBC133 11/2018 netpower1
netpower2
(thus all networking equipment in 1C4 - at least 5 switches and 1 media converter)
Serial #: AS1243245039
black, 2U rack-mount
Manuf. date: October 2012

bought in January 2013
   WAH 2A3

APC SMX1500RM2U

(Smart-UPS 1500)

with external battery pack

 RBC115? 9/2011 gas leak detection systems in 2A2 and possibly C-AD interlock equipment in 2A1

Serial #: AS1039230484
Manuf. date: Sept. 2010

rack-mount, black

battery pack Serial #: QS1002251184
 
*C-AD equipment?*

  WAH 2A9 APC SMT1500RM2U (Smart-UPS 1500) RBC133 original battery (07/2014)

12/2020
grant (Wiener/VME) Serial #: AS143611346
Manuf. date: Sept. 2014
black, rack mount
bought March 2015
   WAH 2A9 APC SMT1500RM2U (Smart-UPS 1500) RBC133 April 2018 TPC interlock distribution panel

surge suppressor with:
-cooling water flow
   meters
-scserv
-2x interlocks 
     equipment in 2A8
Serial #: AS1243245306
Manuf. date: October 2012
black, rack mount
bought ~Dec. 2012
  (in Bldg. 510 now, previously was in the WAH on the floor under the east stairs to RHIC tunnel)

APC BE750G

(Back-UPS ES750)

RBC17 original battery from ~fall 2010 nothing when last seen in the WAH (checked 11/20/2015)

Serial #: 5B1039T74854
Manuf. date: Sept. 2010

black

no self-test option

  WAH North Platform, 1st floor west APC SMT1500RM2U (Smart-UPS 1500)  RBC133  01/2019 north-nps1 (and thus all networking equipment on the north platform) Serial #: AS1144220012
Manuf. date: October 2011

rack-mount, black

  AB, near the GMR  PowerWare    fall 2012?
(Batteries likely have been replaced since then under a service contract, handled by STSG)
 gas system equipment  This is a large UPS for circuits in the Gas Mixing Room, under the care of the STSG group.

IP: gmr-ups.starp.bnl.gov
BNL property tag 145850
bought in fall 2012
  AB, mezzanine top floor (northeast corner)  Mitsubishi 7011A   November 20, 2019 unknown CAD equipment, definitely not STAR's responsibility

Contacts are John Mingoia and Anh Pham

 

This list is maintained as information is made available and is sporadically checked for correctness.  The maintainer of this list is often not informed when STSG adds, removes or replaces UPSes and batteries.  Furthermore, anyone may remove or add equipment to UPSes without informing the maintainer of this list.

 

Spare batteries on hand: 

In a cabinet in the DAQ Room (APC RBC numbers):

4: June 2011(?)
7: December 2020
8: October 2011
24: December 2019
32: December 2014?
43: December 2019
109: March 2020
123: December 2017
133: December 2020

 
(STSG / electronics techs may have additional spares in the Building 510 labs)
 

Windows XP EOL overview

Microsoft support for Windows XP will end on April 14, 2014.  Lab and DOE cybersecurity policy (as well as general best practice) prohibit the use of unsupported operating systems.  This page will serve as an overview on STAR's migration away from Windows XP; specific details per machine (or subsystem) will generally be kept in the associated RT tickets. 

9/25/2013 note:  I have acquired 5 used Dell desktop machines with Vista license keys as potential replacements for some of the machines listed below.  All have 4GB or more of RAM and single 160GB SATA disks.  From the list below, deneb2.starp and conference.star in particular are good candidates for replacement with these machines; possibly videopc as well if the video capture card can be put into one of them.  Others TBD.


XP systems in the SDAS enclave:

HOSTNAME SUBSYSTEM PRIMARY CONTACT RT TICKET (if any) NOTES and EXPECTED RESOLUTION PATH
autueil.starp S&C Wayne Betts 2690 Replace with a Windows 7 machine currently named madison in 1006C
shift-leader.starp ops   2689 Dell says this model (Optiplex 745) has been successfully "Tested for Basic Windows 7 Functionality" and the Windows 7 upgrade advisor tool from MS indicates no significant problems. 

Nonetheless, the plan is to replace this system with a Dell Optiplex 990 (BNL barcode 151457) currently in 510/1-179 (Windows 7). 
tpcgas.starp and its backup machine TPC Jim Thomas 2626 Two new computers are online now as tpcgas1 and tpcgas2.  Peter Kravtsov completed one, the other needs additional configuration, for which Peter provided instructions, but they cannot be completed without swapping hardware, so backup machine is not "perfect" backup yet.)
chaplin-run09, astaire-run09, sirius-run09 TPC Jim Thomas   - Moving to Linux has been discussed numerous times and is still a possibility; the primary hold-up is the TPC Alarm Handler, which is currently a Windows application.  Without a replacement for it within Linux, the assumption has been that at least one Windows machine will need to be available, but in discussing with Alexei, it seems this TPC Alarm Handler is redundant with Slow Controls's STAR Alarm Handler, so may not be necessary after all.  (resolution TBD)
- One more note, discussing this with Alexei and Jim, we all generally seem to agree that they don't need 3 computers (that was a luxury afforded to them in the early days when the Control Room wasn't so crowded) - 2 would suffice.
- Nov. 21 update (WB): It turns out these computers were bought with Vista licenses.  Upgrading in place is a *painfully* lengthy process, but I am attempting it on astaire (with a fallback disk with the XP installation just in case).
- Nov 25 update (WB):  Alexei and Jim have definitively approved a Linux trial.  The astaire PC will have replacement disks installed and a fresh Linux installation (SL 6.4).  Testing of TPC usage is expected to be quick - once approved, will proceed with Linux installation on chaplin.  They request to keep sirius while they try to migrate the TPC alarm handler to Linux (seeking source code from Peter Kravtsov) - if successful, will eliminate sirius, otherwise will proceed with attempted upgrade to Vista.
- Jan 10 update (WB): astaire had Linux installed 3-4 weeks ago and TPC MEDM screens shown made to work nicely after some font adjustments.  Approval to proceed with chaplin (keeping the original disks on stand-by).  Also, the TPC alarm handler (currently "assigned" to sirius) was demonstrated to run fine using Wine on a Sc.Linux 6 machine, so that no longer seems to be a hold-up - simply compy over the Alarms folder, make some fairly obvious path adjustments and firewall openings and it works.

Final disposition:  chaplin and astaire have Sc.Linux installations on them.  sirius still has Windows XP, but is only on a small private network for use with the WAH video and TPC laser systems.
tofgas.starp TOF   2627 Was replaced during Peter Kravtsov's visit in December, 2013.
deneb2.starp general use on South Platform   2680 Replaced with one of the recovered Vista machines. 
Does not need much; does not play a direct role in STAR data-taking; just used during maintenance days as a terminal and web browser.
fmsled FMS Steve Trentalange   a laptop in the Wide Angle Hall - not sure if there is a compelling reason for it to be a laptop going forward, but if so desired, we have available a Sony VIAO with a Vista key (barcode 136278); in any case, it does not need much computing power.  FMS is not expected to be present in the 2014 run, so this is a relatively low priority.  Steve expressed a preference for Windows 7 over Windows Vista, but I doubt it will make any difference, other than possibly giving a longer potential lifetime to the replacement.
MP 11/22: The Sony VIAO machine has a Windows Vista installation on it. All necessary BNL configurations have been made.
1/10/2014 (WB): Unfortunately, the original fmsled laptop has a serious hardware problem and will not boot at all.  Hopefully the disk can be recovered, though that is complicated somewhat by having PGP WDE.


Final disposition:  System is removed from the WAH and the network.  Steve T. says there is nothing critival to recover from it.
hoosier BEMC Steve Trentalange/ Oleg Tsai 2770 WB: 10/15 - Win 7 upgrade advisor says ok for both 32-bit and 64-bit Win 7 installations.
JL: 11/22, assigned to MP
MP: A Dell precision desktop has been allocated for use to replace the old hoosier machine. The machine has been brought up to date and is ready for use. Steve needs to test an HV device on the old machine to ensure that it works. Once he gives the go ahead we will switch over to the new machine. The switch over will hopefully take place during the week of 1/13/14.
Mp: 1/27/14 - The replacement machine has been put in place. LabVIEW 2013 evaluation has been installed for the time being and Steve's VI worked on LabVIEW 2013 on the new Windows 7 machine. The new machine has been put in place, we now just need to get licenses for a legitimate version of LabVIEW and the machine should be finished.
MP: 4/17/14 - LabVIEW 2013 has been purchased and installed on the machine. The Windows XP Machine has been disconnect and is no longer in use.
emcsc / backup emcsc BEMC Steve Trentalange/ Oleg Tsai   WB: 10/17 - Win 7 upgrade advisor says it needs more RAM (currently only 512MB; 1GB min for 32-bit Win 7), and does not know about the compatibility of the National Instruments RS-485 adapter card.  Meanwhile, there is a newer computer (unfortunately also with Win XP) available that was configured 1-2 years ago as a backup for emcsc (including LabVIEW 6.1 and an RS-485 adapter) but it has been sitting unused since then.  Steve has suggested we try putting Windows 7 on the backup machine as a test, and if it works, put it into production.
WB: 1/10:  tested the old PCI 232/485 card in a Windows 7 machine, and was able to download drivers from National Instruments that allow the ports to be recognized, so this might not be a show stopper.  Also, found NI's LabView version compatibiltiy chart and it indicates that LabView 2013 should be able to open VI's saved in version 6.1, so this too is looking positive.  We need to get a version (possibly a trial version?) of the latest LabView to try this out.
Mp: 4/17/14: A Windows 7 machine was delegated for replacement of the emcsc machine. The trial version of LabVIEW 2013 was installed along with the old PCI 232/485 card. The problem was that the LabVIEW 6 code was too old to run on LabVIEW 2013. The .vi would not run properly. I had a LabVIEW technical rep come out to the lab multiple times in order to troubleshoot the issue and the conclusion was that the old code would need to be revamped in order to run under LabVIEW 2013. Fortunately, in order for the emcsc machine to operate, it does not need a network connection (only the NI COM card). The XP machine has been deregistered and disconnected from the network, and will continued to be used until time allows for the LabVIEW code to be updated.
videopc  ops  Alexei Lebedev   Have to evaluate the compatibility of the video capture card (and its software) with Windows Vista/7
WB: 1/10 - having looked into this, I thought it would be impossible, but Alexei informed me today that Andrei Brandin will be at BNL for the collaboration meeting in February, and he thinks he can make the current system work under Windows 7. But if not, we will move this machine to a small private network shared with the TPC Laser system control PCs.

 
Final disposition:  Andrei B. made no progess (or even any effort?) on his visit.  The system still has Windows XP, but is only on a private network now.
 pp2pp-slow  PP2PP     Originally overlooked because it is not on a "star" subnet (it is 130.199.90.72), and the PP2PP subsystem has been inactive for some time.  This is 9.5 year old Dell Pentium 4 system, so not likely a good candidate for Windows 7 or Vista, though it meets the minimum requirements.
MP 2/25: After speaking with Wlodek Guryn and Kin Yip, this machine will not be used for Run14. The machine has been removed from the Control Room by one of Wlodek's guys and will be worked on off of the network. A PP2PP machine will been needed for next year, a replacement machine will need to be purchased and setup down the road.


STAR XP systems outside of subnet 60 (starp/SDAS):

SYSTEM NAME CONTACT/PRIMARY USER LOCATION RT TICKET 
(if any)
RESOLUTION PLAN/SUMMARY 
JML.STAR.BNL.GOV Jeff Landgraf 510/1-184 2677 Have discussed with Jeff - a new PC was ordered (expected to arrive by end of November).
MP: The new PC has come and it all setup for BNL use. Jeff's profile has been setup.
Bugrhoff (DHCP client) Wayne Betts 510/1-179   old laptop - phased out in favor of newer one already in use
DBEAVISDT.STAR.BNL.GOV Dana Beavis 510/1-169   JL: 09/27 - Ambiguity on group
WB: computer has been moved to a C-AD building.  MAC reg., IP address and domain group are no longer associated with STAR

 
BCHRISTIE.STAR.BNL.GOV Bill Christie 510/1-180 2691 JL: 09/27 - Update OK in the coming months if possible, suggest 7 (need to check)
MP: 10/4 I ran the Win 7 Upgrade Advisor. The machines hardware and software is compatable with Win 7 (currently has Win XP 32-bit)
KEATON2.STAR.BNL.GOV Victor Perevoztchikov 510/1-165 2720 JL: 09/27 - Machine could be replaced by a Linux node (preferred)
JL: 11/22, assigned to MP (new node needs to be purchased)
MP: 12/5, A Dell Precision T3610 has been ordered. The machine supports RHEL and will be setup accordingly.
MP: 2/28, The machine has been replaced with the T3610 setup with Scientific Linux 6. The old machine will be retired.
MONROE2.STAR.BNL.GOV Lidia Didenko 510/1-173 2695 possible to upgrade to Vista? (a license key is on the case)
JL: 09/27 - Update OK, is Win 7 possible? Worried of CERT being messed up (saved in IE)
MP: 10/4 I ran the Win 7 Upgrade Advisor. The machine's hardware and software is compatable with Win 7 (currently has Win XP 64-bit)
MP: 11/20 The machine has been upgraded to Windows 7. Refer to ticket # 2695
BANCROFT.STAR.BNL.GOV nobody 1006C   WB: 10/18 - old machine has been pulled from service (it existed solely to operate an old SCSI scanner, which has also been retired)
CONFERENCE.STAR.BNL.GOV   1006C 2687 WB: 10/17 - Vista has been installed on a machine from the Equipment Pool, and the original conference PC has been shut down. 
GRANT.STAR.BNL.GOV John Hammond 901   This is a file server for the electronics support group.  It is largely up to John to move the shared content to a different server to retire this one.
JL: 11/22, assigned to MP
MP: 12/5, I spoke with John, he stated that he has a Windows 7 machine and will be moving the file server to that node himself. I will be in contact with him to record when the XP machine has been taken off the network.
MP: 2/28, I spoke with John this week, he stated that the GRANT machine is still on the network but he will be taking it off at the end of this month. He has a Windows 7 machine to replace the XP machine, just needs to do the switch over.
WB: 4/18/2014:  John was copying the final directories to the replacement today and expects to turn grant off on Monday, 4/21.
PADRAZO1.STAR.BNL.GOV John Hammond 901   WB: 10/17 - John purchased and installed Windows 7 for this system on a new disk and Athena T. will start using it.  The original disk has been put aside in case any files from Ken Asselta turn out to be needed.
PKUCZEWSKIDT.PHY.BNL.GOV Phil Kuczewski 901   MP: 2/25, I sent an email to Phil on 1/2/14 regarding his Windows XP Machines. I never recieved a reply.
PKLAPTOP1.STAR.BNL.GOV Phil Kuczewski 901   (laptop)
MP: 2/25, I sent an email to Phil on 1/2/14 regarding his Windows XP Machines. I never recieved a reply.
DAGOSTINOC.STAR.BNL.GOV John Hammond
Alex Tkatchev

 
901   WB: 10/17 - This is about 4 years old and has a Windows Vista product sticker, but the current plan is to make a fresh Linux installation and let Alex Tkatchev use the system for trigger-related development work.  The original disk has been removed and a new one installed for the Linux installation.
PO-143966.STAR.BNL.GOV Alex Tkatchev 901   WB: 10/15 - This is about 4 years old and has a Win 7 product sticker on it. 
WB: 2/28/14: If a fresh Win 7 install is made, I suggest adding a second disk (if it doesn't already have two) and making a RAID 1 array if possible.



Others of possible concern:

 SYSTEM NAME    LOCATION    NOTE
STAR-UTILITIES.STAR.BNL.GOV (on a C-AD network)   STAR Control Room   runs software provided by C-AD.  Used for STAR WAH video camera system control and monitoring.  We should move the components related to the video system to a starp machine in any case - there's no reason to be crossing subnets and firewalls for this.  11/15/2013 (WB):  no longer required for use with STAR video system.
ROSAS.STAR.BNL.GOV (on a C-AD network)   STAR Control Room   runs software provided by C-AD.


The prohibition on unsupported operating systems is typically only enforced for computers connected to the campus-wide LAN, though variances are possible.  Stand-alone systems and those on local networks do not typically come under scrutiny (in part because they are hard to detect and in part because they pose much less overall risk).

 


Software Infrastructure

On the menu today ...

 

General Information

SOFI stands for SOFtware infrastructure and Infrastructure. It includes any topics related to code standards, tools compiling your code, problems with base code and Infrastructure. SOFI also addresses (or try to address) your need in terms of monitoring or easily manage activities and resources in the STAR environment.

 

Infrastructure & Software

Reporting problems

  • General RCF problems should be reported using Computing Facility Issue reporting system (RT).
    You should NOT use this system to report STAR-specific problems. Instead, use the STAR Request Tracking system described below.
  • To report STAR specific problems: Request Tracking (bug and issues tracker) system, 
    Submitting a problem (bug), help request or issue to the Request Tracking system using Email
    You can always submit a report to the bug tracking system by sending an Email directly. There is no need for a personalized account and using the Web Interface is not mandatory. For each BugTracking category, an equivalent @www.star.bnl.gov mailing list exists.
    The currently available queues are
    bugs-high problem with ANY STAR Software with a need to be fixed without delay
    bugs-medium problem with ANY STAR Software and must be fixed for the next release
    bugs-low problem with ANY STAR Software. Should be fixed for the next release
    comp-support General computing operation support (user, hardware and middleware provisioning)
    issues-infrstruct Any Infrastructure issues (General software and libraries, tools, network)
    issues-scheduler Issues related to the SUMS project (STAR Unified Meta-Scheduler)
    issues-xrootd Issues related to the (X)rootd distributed data usage
    issues-simu Issues related to Simulation
    grid-general STAR VO general Grid support : job submission, infrastructure, components, testing problem etc ...
    grid-bnl STAR VO, BNL Grid Operation support
    grid-lbl STAR VO, LBNL Grid Operation support
    wishlist Use it for or suggesting what you would wish to see soon, would be nice to have etc ...

    You may use the guest account for accessing the Web interface. The magic word is here.
    • To create a ticket, select the queue (drop down menu after the create-ticket button). Queues are currently sorted by problem priority. Select the appropriate level. A wishlist queue has been created for your comments and suggestions. After the queue is selected, click on the create-ticket button and fill the form. Please, do not forget the usual information i.e. the result of STAR_LEVELS and of uname -a AND a description of how to reproduce the problem.
    • If you want to request a private account instead of using the guest account, send a message to the wishlist queue. There are 2 main reasons for requesting a personalized account :
      1. If you are planning to be an administrator or a watcher of the bug tracking system (that is, receive tickets automatically, take responsibility for solving them etc ...) you MUST have a private account.
      2. If you prefer to see the summary and progress of your own submitted tickets at login instead of seeing all tickets submitted under the guest account, you should also ask for a private account.
    • At login, the left side panels shows the tickets you have requested and the tickets you own. The right panel shows a the status of all queues. Having a private account setup does NOT mean that you cannot browse the other users tickets. It only affects the left panels summary.
    • To find a particular bug, click on search and follow the instructions.
    • Finally, if you would like to have a new queue created for a particular purpose (sub-system specific problems), feel free to request to setup such a queue.

 

General Tools 

Data location tools

Several tools exists to locate data both on disk and in HPSS. Some tools are available from the production page and we will list here only the tools we are developing for the future.

Resource Monitoring tools

Browsers

Web Sanity, Software & documentation tools

Web based access and tools

Web Sanity

Software & documentation auto-generation

  • Our STAR Software CVS Repositories browser
    Allows browsing the full offline and online CVS repositories with listings showing days since last modification, modifier, and log message of last commit, display and download (checkout) access to code, access to all file versions and tags, and diff'ing between consecutive or arbitrary versions, direct file-level access to the cross-referenced presentation of a file, ... You can also sort by
    1. by user
    2. recent history
  • Doxygen Code documentation (what is already doxygenized )
    and the User documentation (a quick startup ...) Our current Code documentation is generated using the doxygen generator. Two utilities exists to help you with this new documentation scheme :
    1. doxygenize is a utility which takes as argument a list of header files and modify them to include a "startup" doxygen tag documentation. It tries to guess the comment block, the author and the class name based on content. The current version also documents struct and enum lists. Your are INVITED TO CHECK the result before committing anything. I have tested on several class headers but there is always the exception where the parsing fails ...
    2. An interface to doxygen named doxycron.pl was created and installed on our Linux machines to satisfy the need of users to generate the documentation by themselves for checking purposes. That same generator interface is used to produce our Code documentation every day so, a simple convention has been chosen to accomplish both tasks. But why doxycron.plinstead of directly using doxygen? If you are a doxygen expert, the answer is 'indeed, why ?'. If not, I hope you will appreciate that doxycron.pl not only takes care of everything for you (like creating the directory structure, a default actually-functional configuration file, safely creating a new documentation set etc ....) but also adds a few more tasks to its list you normally have to do it yourself when using doxygen base tools (index creation, sorting of run-time errors etc ...). This being said, let me describe this tool now ...

      The syntax for doxycron.pl is
      % doxycron.pl [-i] PathWhereDocWillBeGenerated Path(s)ToScanForCode Project(s)Name SubDir(s)Tag

      The arguments are:
      • -i is used here to disable the doxytag execution, a useless pass if you only want to test your documentation.
      • PathWhereDocWillBeGenerated is the path where the documentation tree will be or TARGETD
      • Path(s)ToScanForCode is the path where the sources are or INDEXD (default is the comma separated list /afs/rhic.bnl.gov/star/packages/dev/include,/afs/rhic.bnl.gov/star/packages/dev/StRoot)
      • Project(s)Name is a project name (list) or PROJECT (default is the comma separated include,StRoot)
      • SubDir(s)Tag an optional tag (list) for an extra tree level or SUBDIR. The default is the comma separated list include, . Note that the last element is null i.e. "". When encountered, the null portion of a SUBDIR list will tell doxycron.pl to generate an searchable index based all previous non-null SUBDIR in the list.

      Note that if one uses lists instead of single values, then, ALL arguments MUST be a list and the first 3 are mandatory.
      To pass an empty argument in a list, you must use quotations as in the following example

      % doxycron.pl /star/u/jeromel/work/doxygen /star/u/jeromel/work/STAR/.$STAR_HOST_SYS/DEV/include,/star/u/jeromel/work/STAR/DEV/StRoot include,StRoot 'include, '

      In order to make it clear what the conventions are, let's describe a step by step example as follow:

      Examples 1 (simple / brief explaination):
      % doxycron.pl `pwd` `pwd`/dev/StRoot StRoot
      would create a directory dox/ in `pwd` containing the code documentation generated from the relative tree dev/StRoot for the project named StRoot. Likely, this (or similar) will generate the documentation you need.

      Example 2 (fully explained):
      % doxycron.pl /star/u/jeromel/work/doxygen /star/u/jeromel/work/STAR/DEV/StRoot Test
      In this example, I scan any source code found in my local cvs checked-out area /star/u/jeromel/work/STAR/DEV starting from StRoot. The output tree structure (where the documentation will end) are requested to be in TARGETD=/star/u/jeromel/work/doxygen. In order to accomplish this, doxycron.pl will check and do the following:
      • Check that the doxygen program is installed
      • Create (if it does not exists) $TARGETD/dox directory where everything will be stored and the tree will start
      • Search for a $TARGETD/dox/$PROJECT.cfg file. If it does not exists, a default configuration file will be created. In our example, the name of the configuration file defaults to /star/u/jeromel/work/doxygen/dox/Test.cfg. You can play with several configuration file by changing the project name. However, changing the project name would not lead to placing the documents in a different directory tree. You have to play with the $SUBDIR value for that.
      • The $SUBDIR variable is not used in our example. If I had chosen it to be, let's say, /bof, the documentation would have been created in $TARGETD/dox/bof instead but the template is still expected to be $TARGETD/dox/$PROJECT.cfg.

      The configuration file should be considered as a template file, not a real configuration file. Any item appearing with a value like Auto-> or Fixed-> will be replaced on the fly by the appropriate value before doxygen is run. This ensures keeping the conventions tidy and clean. You actually, do not have to think about it neither, it works :) ... If it does not, please, let me know. Note that the temporary configuration file will be created in /tmp on the local machine and left there after running.

      What else does one need to know : the way doxycron.pl works is the safest I could think off. Each new documentation set is re-generated from scratch, that is, using temporary directories, renaming old ones and deleting very old ones. After doxycron.pl has completed its tasks, you will end up with the directories $TARGETD/dox$SUBDIR/html and $TARGETD/dox$SUBDIR/latex. The result of the preceding execution of doxycron.pl will be in directories named html.old and latex.old.
      One thing will not work for users though : the indexing. The installation of indexing mechanism in doxygen is currently not terribly flexible and fixed values were chosen so that clicking on the Search index link will go to the cgi searching the entire main documentation pages.

      As a last note, doxygen understands ABSOLUTE path names only and therefore, doxycron.pl will die out if you try to use relative paths as the arguments. Just as a reminder, /titi/toto is an absolute path while things like ./or ./tata are relative path.

HPSS tools & services

  • How to retrieve files from HPSS. Please, use the Data Carousel and ONLY the DataCarousel.
    Note: DO NOT use hsi to
    retrieve files from HPSS - this access mode locks tape drives for exclusive use (only you, not shared with any other user) and have dire impacts on STAR;s operations from production to data restores. If you are caught using it, you will be banned from accessing HPSS (your privilege to access HPSS resources will be revoked).
    Again - Please, use the Data Carousel.
  • Archiving into HPSS
    Several utilities exists. You can find the reference on the RCF HPSS Service page. Those utilities will bring you directly in the Archive class of service. Note that the DataCarousel can retrieve files from ANY class of service. The prefered mode for archining is the use of htar.
    NOTE: You should NOT abuse those services to retreive massive amount of files from HPSS (your operation will otherwise clash with other operations, including stall or slow down data production). Use the DataCarousel instead for massive file retreival. Abuse may lead to supression of access to archival service.
    • For rftp, history is in an Hypernews post Using rftp . If you save individual files and have lots of files in a directory, please avoid causing a Meta_data lookup. Meta-data lookup happens when you 'ls -l'. As a reminder, please keep in mind that HPSS is NOT made for neither small files and large amount of files in directories but for massive large files storage (on 2007/10/10 for example, a user crashed HPSS with a single 'ls -l' lookup of a 3000 files directory). In that regard, rftp is most useful if you create first an archive of your files yourself (tar, zip,...) and push the archive into HPSS afterward. If this is not your mode of operation, the preferred method is the use of htar which provides a command-line direct HPSS archive creation interface.
    • htar is the recommended mode for archining into HPSS. This utility provides a tar-like interface allowing for bundling together several files or an entire directory tree. Note the syntax of htar and especiallythe extract below from this thread:
      If you want the file to be created in /home/<username>/<subdir1> and <subdir1> does not existed yet, use
      % htar -Pcf /home/<username>/<subdir1>/<filename> <source>
      
      If you want the file to be created into /home/<username>/<subdir2> and <subdir2> already exists, use
      % htar -cf /home/<username>/<subdir2>/<filename> <source>
      
      Please consult the help on the web for more information about htar.
    • File size is limited to be <55GB, and if exceeded you will get Error -22. In this case consider using split-tar. A simple example/syntax on how to use split-tar is:
      % split-tar -s 55G -c blabla.tar blabla/
      This will at least create blabla-000.tar but also the next sequences (001, 002, ...) each of 55 GBytes until all files from directory blabla/ are packed. The magic 55 G suggested herein and in many posts works for any generation of drive for the past decade. But a limit of 100-150 GB should also  work on most media at BNL as per 2016. See this post for a summary of past pointers.
    • You may make split-tar archive cross-compatible with htar by creating afterward the htar indexes. To do this, use a command such as 
      % htar -X -E -f blabla-000.tar
      this will create blabla-000.tar.idx you will need to save in HPSS along the archive.

       

Batch system, resource management system

SUMS (aka, STAR Scheduler)


SUMS, the product of the STAR Scheduler project, stands for Star Unified Meta-Scheduler. This tool is currently documented on its own pages. SUMS provides a uniform user interface to submitting jobs on "a" farm that is, regardless of the batch system used, the language it provides (in XML) is identical. The scheduling is controlled by policies handling all the details on fitting your jobs in the proper queue, requesting proper resource allocation and so on. In other words, it isolates users from the infrastructure details.

You would benefit from starting with the following documents:

LSF

LSF was dropped from BNL facility support in July 2008 due to licensing cost. Please, refer to the historical revision for information about it. If a link brought you here, please update or send a note to the page owner. Information have been kept un-published You do not have access to view this node.

Condor

Quick start ...

Condor Pools at BNL

The condor pools are segmented into four pools extracted from this RACF page:

production jobs +Experiment = "star" +Job_Type = "crs" high priority CRS jobs, no time limit, may use all the slots on CRS nodes and up to 1/2 available job slots per system on CAS ; the CRS portion is not available to normal users and using this Job_Type for user will fail
users normal jobs +Experiment = "star" +Job_Type = "cas" short jobs, 3 to 5 hour soft limit (when resources are requested by others), 40 hour hard limit - this has higher priority than the "long" Job_Type.
user long jobs +Experiment = "star" +Job_Type = "long" long running jobs, 5 day soft limit (when resources are requested by others), 10 day hard limit, may use 1 job slot per system on a subset of machines
general queue +Experiment = "general"
+Experiment = "star"
  General queue shared by multiple experiments, 2 hours guaranteed time minimum (can be evicted afterward by any experiment's specific jobs claiming the slot)

 

The Condor configurations do not have create a simple notion of queues but generates a notion of pools. Pools are group of resources spanning all STAR machines (RCAS and RCRS nodes) and even other experiment's nodes. The first column tend to suggest four of such pools although we will see below that life is more complicated than that.

First, it is important to understand that the +Experiment attribute is only used for accounting purposes and what makes the difference between a user job or a production job or a general job is really the other attributes. 

Selection of how your jobs will run is the role of +Job_Type attribute. When it is unspecified, the general queue (spanning all RHIC machines at the facility) is assumed but your job may not have the same time limit. We will discuss the restriction later. The 4th column of the table above shows the CPU time limits and additional constraints such as the number of slots within a given category one may claim. Note that the +Job_type="crs" is reserved and its access will be enforced by Condor (only starreco may access this type).

In addition of using +Job_type which as we have seen controls what comes as close as possible to a queue in Condor, one may need to restrict its jobs to run on a subset of machines by using the CPU_Type attribute in the Requirements tag (if you are not completely lost by now, you are good ;-0 ).  An example to illustrate this:

+Experiment = "star"
+Job_type = "cas"
Requirements = (CPU_type != "crs") && (CPU_Experiment == "star")

In this example, a cas job (interpret this as "a normal user analysis job") is being run on behalf of the experiment star. The CPU / nodes requested are the CPU belonging to the star experiment and the nodes are not RCRS nodes. By specifying those two requirements, the user is trying to make sure that jobs will be running on RCAS nodes only (or != "crs") AND, regardless of a possible switch to +Experiment="general", the jobs will still be running on the nodes belonging to STAR only.

In this second example

+Experiment = "star"
+Job_type = "cas"
Requirements = (CPU_Experiment == "star")

we have pretty much the same request as before but the jobs may also run on RCRS nodes. However, if data production runs (+Job_type="crs" only starreco may start), the user's job will likely be evicted (as production jobs will have higher priorities) and the user may not want to risk that hence specifying the first Requirements tag.

Pool rules

A few rules apply (or summarized) below:

  • Production jobs cannot be evicted on their claimed slots ... since they have higher priority than user jobs even on CAS nodes, this means that as soon as production jobs starts, its pool of slots will slowly but surely be taken - user's jobs may use those slots at low-downs of utlization.
  • Users jobs can be evicte. Eviction happens after 3 hours of runtime from the time they start but only if the slot they are running in is claimed by other jobs. For example, if a production job wants a node being used by a user job that has been running for two hours then that user job has one hour left before it gets kicked out ...
  • This time limit comes into effect when a higher priority job wants the slot (i.e. production vs. user or production)
  • general queue jobs are evicted after two hours of guaranteed time when the slot is wanted by ANY STAR job (production, user)
  • general queue jobs will be evicted however if they consume more than 750 MB of memory

This provides the general structure of the Condor policy in place for STAR. The other policy options in place goes as follows:

  1. The following options apply to all machines: the 1 mn load has to be less than 1.4 on a two CPU node for a job to start
  2. General queue jobs will not start on any node unless 1 min < 1.4, swap > 200M, memory > 100M.
  3. User fairshare is in place.

In the land of confusion ...

Also, users are often confused of the meaning of their job priority. Condor will consider a user's job priority and submit jobs in priority order (where the larger the number, more likely the job willl start) but those priorities have NO meaning across two distinct users. In other words, it is not because user A sets job priorities larger by an order of magitude comparing to user B that his job will start first. Job priority only providesa mechanism for a user to specify which of their idle jobs in the queue are most important. Jobs with higher numerical priority should run before those with lower priority, although because jobs can be submitted from multiple machines, this is not always the case. Job prioritties are listed by the condor_q command in the PRIO column.

The effective user priority is dynamic, on the other hand, and changes as a user has been given access to resources over a period of time. A lower numerical effective user priority (EUP) indicates a higher priority. Condor's fairshare mechanism is implemented via EUP. The condor_userprio command hence provides an indication of your faireshareness.

You should be able to use condor_qedit to manually modify the "Priority" parameter, if desired. If a job does not run for weeks, there is likely a problem with its submitfile or one of its input, and in particular its Requirements line. You can use condor_q -analyze JOBID, or condor_q -better-analyze JOBID to determine why it cannot be scheduled.

 

What you need to know about Condor

First of all, we recommend you use SUMS to submit to Condor as we would take care of adding codes, tricks, tweaks to make sure your jobs run smoothly. But if you really don't want to, here are a few issues you may encounter:

  • Unless you use the GetEnv=true  datacard directive in your condor job description, Condor jobs will start with a blank set of environment variables unlike a shell startup. Especially, none of
    SHELL, HOME, LOGNAME, PATH, TERM and MAIL
    will be define. The absence of $HOME will have for side effect that, whenever a job starts, your .cshrc and .login will not be seen hence, your STAR environment will not be loaded. You must take this into account and execute the STAR login by hand (within your job file).
    Note that using GetEnv=true has its own sde effects which includes a full copy of the environment variables as defined from the submitter node. This will NOT be suitable for distributed computing jobs. The use of getenv() C primitive in your code is especially questionable (it will unlikely return a valid value) and
    • STAR user may look at this post for more information on how to use ROOT function calls for defining some of the above.
    • You may also use getent shell command (if exists) to get the value of your home directory
    • A combinations of getpwuid(), getpwnam() would allow to define $USER and $HOME
       
  • Condor follows a multi-submitter node model with no centralized repository for all jobs. As a consequence, whenever you use a command such as condor_rm, you would kill the jobs you have submitted from that node only. To kill jobs submitted from other submitter nodes (any interactive node at BNL is a potential submitter node), you need to loop over the possibilities and use the -name command line option.
     
  • Condor will keep your jobs indefinitely in the Pool unless you either remove the jobs or specify a condition allowing for jobs to be automatically removed upon status and expiration time. A few examples below could be used for the PeriodicRemove Condor datacard
    • To automatically remove jobs which have been in the queue for more than 2 days but marked as status 5 (held for a reason or another and not moving) use
      (((CurrentTime - EnteredCurrentStatus) > (2*24*3600)) && JobStatus == 5)
    • To automatically remove jobsruning the the queue for more than 2 days but using less than 10% of the CPU (probably looping or inefficient jobs blocking a job slot), use
      (JobStatus == 2 && (CurrentTime - JobCurrentStartDate > (54000)) && 
                          ((RemoteUserCpu+RemoteSysCpu)/(CurrentTime-JobCurrentStartDate) < 0.10))
    The full current condition SUMS add to each job is
    PeriodicRemove  = (JobStatus == 2 && (CurrentTime - JobCurrentStartDate > (54000)) && 
                       ((RemoteUserCpu+RemoteSysCpu)/(CurrentTime-JobCurrentStartDate) < 0.10)) || 
                      (((CurrentTime - EnteredCurrentStatus) > (2*24*3600)) && JobStatus == 5)

Some condor commands

This is not meant to be an exhaustive set of commands nor a tutorial. You are invited to read to the manpages for condor_submit, condor_rm, condor_q, condor_status. Those will be most of what you will need to use on a daily basis. Help for version 6.9 is available online.

  • Query and information
    • condor_q -submitter $USER
      List jobs of specific submitter $USER from all the queues in the pool
    • condor_q -submitter $USER -format "%s\n" ClusterID
      Shows the JobID for all jobs of $USER. This command may succeed although an unconstrained condor_q may fell if we had a large amount of jobs
    • condor_q -analyze $JOBID
      Perform an approximate analysis to determine how many resources are available to run the requested jobs.
    • condor_status -submitters
      shows the numbers of running/idle/held jobs for each user on all machines
    • condor_status -claimed
      Summarize jobs by servers as claimed
    • condor_status -avail
      Summarize resources which are available
       
  • Removing jobs, controlling them
    • condor_rm $USER
      removes all of your jobs submitted from this machine
    • condor_rm -name $node $USER
      removed all jobs for $USER submitted from machine $node
    • condor_rm -forcex $JOBID
      Forces the immediate local removal of jobs in undefined state (only affects jobs already being removed). This is needed if condor_q -submitter shows your job but condor_q -analyze $JOBID does not (indicating an out of sync information at Condor level).
    • condor_release $USER
      releases all of your held jobs back into the pending pool for $USER
    • condor_vacate -fast
      may be used to remove all jobs from the submitter node job queue. This is a fast mode command (no checks) and applies to running jobs (not pending ones)
       
  • More advanced
    • condor_status -constraint 'RemoteUser == "$USER@bnl.gov"'
      lists the machines on which your jobs are currently running
    • condor_q -submitter username -format "%d" ClusterId -format "  %d" JobStatus -format "   %s\n" Cmd
      shows the job id,  status, and command for all of your jobs.  1==Idle, 2==Running for Status.  I use something like this because the default output of condor_q truncates the command at 80 characters and prevents you from seeing the actually scheduler job ID associated with the Condor job.  I'll work on improving this command, but this is what I've got for now.
    • To access the reason for job 26875.0 to be held from a submitter node advertized to be rcas6007, use the following command to have a human readable format
      condor_q -pool condor02.rcf.bnl.gov:9664 -name rcas6007 -format "%s\n" HoldReason 26875.0
       

 

CVS->Git

 

Computing Environment


The pages below will give you a rapid overview of the computing environment at BNL, including information for visitors and employees, accessible printers, best practices, recommended tools for managing Windows.

FAQs and Tips

Software Site Licenses

Do we have a site license for software package XYZ?

The answer is (almost) always: No!
Neither STAR nor BNL have site licences for any Microsoft product, Hummingbird Exceed, WinZIP, ssh.com's software or much of anything intended to run on individual users' desktops. Furthermore, for most purposes BNL-owned computers do not qualify for academic software licenses, though exceptions do exist.

FAQ: PDF creation

How can I create a file in pdf format?

Without Adobe Acrobat (an expensive bit of software), this can be a daunting question. I am researching answers, some of which are available in my Windows software tips. Here is the gist of it in a nutshell as I write this -- there are online conversion services and OpenOffice is capable of exporting PDF documents.

FAQ: X Servers

What X server software should I use in Windows?

I recommend trying the X Server that is available freely with Cygwin, for which I have created some documentation here: Cygwin Tips. If you can't make that work for you, then I next recommend a commercial product called Xmanager, available from http://www.netsarang.com. Last time I checked, you could still download a fully functional version for a time-limited evaluation period.

TIP: Windows Hibernation trick

Hibernate or Standby -- There is a difference which you might find handy: 
  • "Standby" puts the machine in a low power state from which it can be woken up nearly instantly with some stimulus, such as a keystroke or mouse movement (much like a screensaver) but the state requires a continuous power source.  The power required is quite small compared to normal running, but it can eventually deplete the battery (or crash hard if the power is lost in the case of a desktop).
  • "Hibernate" actually dumps everything in memory to disk and turns off the computer, then upon restarting it reloads the saved memory and basically is back to where it was.  While hibernating, no power source is required.  It can't wake up quickly (it takes about as long as a normal bootup), but when it does wake up, (almost) everything is just the way you left it.  One caveat about networking is in order here:  Stateful connections (eg. ssh logins) are not likely to survive a hibernation mode (though you may be able to enable such a feature if you control both the client and server configurations), but most web browsing activity and email clients, which don't maintain an active connection, can happily resume where they left off.

Imagine:  the lightning is starting, and you've got 50 windows open on your desktop that would take an hour to restore from scratch.  You want to hibernate now!  Here's how to enable hibernating if it isn't showing up in the shutdown box: 
Open the Control Panels and open "Power Options".  Go to the "Hibernate" tab and make sure the the box to enable Hibernation is checked.  When you hit "Turn Off Computer" in the Start menu, if you still only see a Standby button, then try holding down a Shift key -- the Standby button should change to a Hibernate button.  Obvious, huh?

For the curious:
There are actually six (or seven depending on what you call "official") ACPI power states, but most motherboards/BIOSes only support a subset of these.  To learn more, try Googling "acpi power state", or you can start here as long as this link works.  (Note there is an error in the main post -- the S5 state is actually "Shutdown" in Microsoft's terminology). 
From the command line, you can play around with these things with such straightforward commands as:

%windir%\System32\rundll32.exe powrprof.dll,SetSuspendState 1 

Even more obvious, right?  If you like that, then try this on for size.

TIP: My new computer is broken!:

It's almost certainly true - your new computer is faulty and the manufacturer knows it!  Unfortunately, that's just a fact of life.  Straight out of the box, or after acquiring a used PC, you might just want to have a peek at the vendor's website for various updates that have been released.  BIOS updates for the motherboard are a good place to start, as they tend to fix all sorts of niggling problems.  Firmware updates for other components are common as are driver updates and software patches for pre-installed software.  I've solved a number of problems applying these types of updates, though it can take hours to go through them thoroughly and most of the updates have no noticeable effect.  And it is dangerous at times.  One anecdote to share here -- we had a common wireless PC Card adapter that was well supported in both Windows and Linux.  The vendor provided an updated firmware for the card, installed under Windows.  But it turned out that the Linux drivers wouldn't work with the updated firmware.  So back we went to reinstall a less new firmware.  You'll want to try to be intelligent and discerning in your choices.  Dell for instance does a decent job with this (your Dell Service Tag is one very useful key here), but still requires a lot from the updater to help ensure things go smoothly.  This of course is in addition to OS updates that are so vital to security and discussed elsewhere.

ITD backups of STAR computers


The following Linux systems are being backed up by the Avamar system: 

ITD backups of Linux systems
Host
Backup Service Backup Set Notes
alh2.starp Avamar all local disk Slow Controls Alarm Handler
beatrice.starp Avamar all local disk Barrel Calorimeter and related systems
blanchett2.starp Avamar    
daqman.starp Avamar /RTS, /etc, /home, /var/named DAQ infrastructure server
caine2.star Avamar /home Levente Hajdu's desktop workstation
dean.star Avamar all local disk Online Web server
evp.starp Networker /a/jevp, /etc, /home Event Pool (event pool data is not saved)
fc1.star Avamar all local disk File Catalog DB master
mtd-cr.starp Avamar   MTD PC in Control Room
onldb.starp Avamar /etc, /home, /online Online DB master
onlldap.starp Networker ? NIS and NFS server for OLP
presley2.star Avamar all local disk Electronics Lab private network gateway and VME boot server
robinson.star Avamar all local disk Offline DB master
sc2.starp Networker / Slow Controls
sc5.starp Avamar all local disk Slow Controls
softioc4.starp Avamar   Slow Controls
startrg.starp Avamar all local disk Trigger
sun.star Avamar   STAR's main web server and HyperNews mail server
tofp.starp Avamar all local disk TOF Control Room PC
webbj.star Avamar all local disk Jason Webb's desktop PC

 
Avamar backup summary reports are being sent to starsupport@bnl.gov.

STAR has a group (/clients/servers/linux/star) in which we can initiate restore requests using a Windows application available here (note, this is updated somewhat frequently, this is the current version as of July 10, 2013):

https://avamar1.b459.bnl.gov/DPNInstalls/downloads/WIN32/AvamarConsoleMultiple-windows-x86-6.1.1-87.exe

Note, with ITD's help, it is also possible to restore files to a system other than the one they were taken from - very useful for recovering from complete system failures.


Spring 2013: Growth rate of onlldap's storage (passing 600GB during the 2013 run) prompted a request from ITD to remove it from Avamar and return to the Networker system for its backups.  This serves as the NIS and NFS server for users of the online Linux pool and the STAR SSH gateways.
 

For Windows workstations, until mid-2010, ITD provided a Retrospect backup client, which was capable of making bare-bones recoveries, but had a very short retention policy and no way to restore files without ITD's help.  They are now using Avamar for Windows workstation backups, which provides a longer retention policy, but eliminates the bare-bones recovery and excludes various multimedia files.  An improvement in Avamar is the ability to restore files directly from the client, without need for ITD assistance.  The Shift Leader computer, and the workstations used by Liz M., Bill C. and the STSG file server (amongst others) are all backed up using this system.

 

Printers


STAR's publicly available printers are listed below. 


IP name
Wireless (Corus) CUPS URL
IP address Model Location rcf2 Queue Name Features
lj4700.star.bnl.gov

http://cups.bnl.gov:631/printers/HP_Color_LaserJet_4700_2
130.199.16.220 HP Color LaserJet 4700DN 510, room M1-16 lj4700-star color, duplexing, driver download site
(search for LaserJet 4700, recommend the PCL driver)
lj4700-2.star.bnl.gov

http://cups.bnl.gov:631/printers/lj4700-2.star.bnl.gov
130.199.16.221 HP Color LaserJet 4700DN 510, room M1-16 lj4700-2-star color, duplexing, driver download site
(search for LaserJet 4700, recommend the PCL driver)
hp510hall.star.bnl.gov

http://cups.bnl.gov:631/printers/hp510hall
130.199.16.222 HP LaserJet 2200DN 510, outside 1-164 hp510hall B&W, duplexing
starhp2.star.bnl.gov

http://cups.bnl.gov:631/printers/starhp2.star.bnl.gov
130.199.16.223 HP LaserJet 8100DN 510M, hallway starhp2_p B&W, duplexing
onlprinter1.star.bnl.gov

http://cups.bnl.gov/printers/onlprinter1.star.bnl.gov
130.199.162.165 HP Color LaserJet 4700DN 1006, Control Room staronl1 color, duplexing
chprinter.star.bnl.gov

N/A
130.199.162.178 HP Color LaserJet 3800dtn 1006C, mailroom n/a color, duplexing

There are additional printing resources available at BNL, such as large format paper, plotters, lamination and such.  Email us at starsupport 'at' bnl.gov and we might be able to help you locate such a resource.

 

Printing from the wireless (Corus) network

The "standard" way of printing from the wireless network is to go through ITD's CUPS server on the wireless network.  How to do this varies from OS to OS, but here is a Windows walkthrough.  The key thing is getting the URI for the printer into the right place:
 

  • Open the Printers Control Panel and click "Add a Printer". 
  • Select the option to add a network printer.  (Ignore the list of printers that it generates automatically).
  • Click on the button or option for "the printer that I want isn't listed". 
  • Select the option for a shared printer and enter the green URL from the list above for the printer you want.
    eg. http://cups.bnl.gov:631/printers/HP_Color_LaserJet_4700_2
  • On the next window, select the hardware manufacturer and model (if not listed, let Windows search for additional models).
  • Print a test page and cross your fingers... 
  • If your test print does not come out, it doesn't necessarily mean your configuration is wrong - sometimes a problem occurs on the the CUPS server that prevents printing - it isn't always easy to tell where the fault lies.

 

Since printing through ITD's CUPS servers at BNL has not been very reliable, here are some less convenient alternatives to using the printers that you may find handy.  (Note that with these, you can even print on our printers while you are offsite - probably not something to do often, but might come in handy sometimes.)
 

1.  Use VPN.  But if you are avoiding the internal network altogether for some reason, or can't use the VPN client, then keep reading...

2.  Get your files to rcf2.rhic.bnl.gov and print from there.  Most of printers listed above have rcf print queues (hence the column "rcf2 queue name").  But if you want to use a printer for which there is no queue on rcf2, or you have a format or file type that you can't figure out how to print from rcf2, then the next tip might be what you need.

3.  SSH tunnels can provide a way to talk directly (sort-of) to almost any printer on the campus wired network.  At least as far as your laptop's print subsystem is concerned, you will be talking directly to the printer.  (This is especially nice if you want to make various configuration changes to the print job through a locally installed driver.)  But if you don't understand SSH tunnels, this is gonna look like gibberish:

Here is the basic idea, using the printer in the Control Room.
It assumes you have access to both the RSSH and STAR SSH gateways.

The ITD SSH gateways might also work in place of rssh (I haven't
tried them yet).  If they can talk directly to our printers,
then it would eliminate step C below.

A.  From your laptop:

ssh -A -L 9100:127.0.0.1:9100 <username>@rssh.rhic.bnl.gov

(Note 1:  -A is only useful if you are running an ssh-agent with a
loaded key, which I highly recommend)

(Note 2:   Unfortunately, the rssh gateways cannot talk directly to our
printers, so we have to create another tunnel to a node that can...  If the
ITD SSH gateways can communicate directly with the printers, then the
next hop would be unnecessary...)

B.  From the rssh session:

ssh -L 9100:130.199.162.165:9100 <username>@stargw1.starp.bnl.gov

(Note 1: 130.199.162.165 is the IP address of onlprinter1.star.bnl.gov -
it could be replaced with any printer's IP address on the wired network.)
(Note 2:  port 9100 is the HP JetDirect default port - non-HP printers
might not use this, and there are other ways of communicating with HP
network printers, so ymmv - but the general idea will work with most TCP 
communications, if you know the port number in use. 

C.  On your laptop, set up a local print queue as if you were going to
print directly to the printer over the network (with no intermediate
server), but instead of supplying the printer's IP address, use
127.0.0.1 instead.

D. Start printing...


If you close either of the ssh sessions above, you will have to
re-establish them before you can print again. 

The two ssh commands can be combined into one and you can create an alias to
save typing the whole thing each time.  (Or use PuTTY or some other GUI SSH client
wrapper to save these details for reuse.)

You could set up multiple printers this way, but to use them
simultaneously, you would need to use unique port numbers for each one
(though the port number at the end of the printer IP would stay 9100).

 

Direct connection, internal network

You can use direct connections to access them over the network.

  • Direct:  These printers accept direct TCP/IP connections, without any intermediate server. 
  • JetDirect (AppSocket) and lpd usually work under Linux. 
  • For Windows NT/2K/XP, a Standard TCP/IP port is usually the way to go. 

How to configure this varies with OS and your installed printing software.

From the Wireless

A restricted amount of printers are accessible via the BNL wireless and served via a CUPS server. The access and printing could be delayed (there are multiple CUPS servers involved to pass the printing from outside to inside).

Follow the methods below depending on your OS. If you have information on how to set on other OS, please let us know and we will add the instructions here.

The CUPS server URL is CUPSURL=http://cups:631/ and its IP CUPSIP==130.199.154.13 (the alias cups should be defined on the Wireless). Only printers displayed in green on Printers are valid.

On Windows system

  • go to Start -> Settings -> Printers and Faxes
  • An explorer window will open with possibly the printers you have already set
  • Click Add Printer - the Wizard will open
  • Click Next
  • In the Local or Network Printer menu, select A network printer or a printer attached to another computer - click next
  • In the Specify a Printer menu, select  Connect to a printer on the internet and fill the URL box accordingly
    • You need to specify a printer available from the CUPS server - go the CUPS server Web site to see available printers

MAC Users

Instructions 1:
  • Use a browser and go to:  localhost:631/printers
  • Go to the bottom of the page and click:  add printer 
  • Select the name of the printer and characteristics
    • For example HP_Color_LasetJet_4700_2
    • Use printer location:  the IP address of the CUPS server (see above  for the value of CUPSIP)
    • Device:   Internet Printing Protocol (http) 
    • Use the CUPS server URI to the selected printer for the device URI
    • Select Model/Driver HP 
    • Model:   scroll to get the proper model, for example "HP LaserJet Series" or "HP Color LaserJet 4700"  (this depends on what is loaded on user's Mac)
Instructions 2:
  • Click on the apple in the upper left, System Preferences->Print and Fax
  • Authenticate by clicking on the key (need to authenticate as a user with administrator privileges)
  • Click on the + to add a new printer
  • Click on Protocol to get "Internet Printing Protocol - IPP"
    • Address: use the CUPS server address (see above for the value of CUPSIP)
    • Queue: printers/HP_Color_LaserJet_4700_2
    • Name: give it something useful like "LJ4700_wireless"
    • Print using "HP"
    • Select HP Color Laserjet 4700
  • Click Add then click continue
  • Double check that it's right by bringing up the CUPS web interface.
    Should have the "Device URI" of the form ${CUPSURI}/printers/HP_Color_LaserJet_4700_2 as per our example.

Tips

What follows are miscellaneous tips and suggestions that will be irregularly maintained.

  • The 2-sided printers are configured to print 2-sided by default, but the default for many printer drivers will override this and specify 1-sided.  If you are printing from Windows, you can usually choose your preferences for this in the printer preferences or configuration GUI.  You may need to look in the Advanced Settings and/or Printing Defaults to enable 2-sided printing in Windows.
  • Depending on the print method and drivers used, from the Linux command line you may be able to specify various options for things like duplex printing.  To see available options for a given print queue, try the "lpoptions" command.  For instance, on rcf2 you could do "lpoptions -d xerox7300 -l".  In the output, you will find a line like this:  "Duplex/2-Sided Printing: DuplexNoTumble *DuplexTumble None"  (DuplexNoTumble is the same as flip on long edge, while DuplexTumble is the same as flip on short edge, and the * indicates the default setting.)  So to turn off duplex printing, you could do "lp -d xerox7300 -o Duplex=None <filename>".  Keep in mind that not all options listed by lpoptions may actually be supported by the printer, and the defaults (especially in the rcf queues) may not be what you'd like.  There are so many print systems, options and drivers in Linux/Unix that there's no way to quickly describe all the possible scenarios.
  • There is a handy utility called a2ps that is available on most Linux distributions. It is an "Any to PostScript" filter that started as a Text to PostScript converter, with pretty printing features and all the expected features from this kind of program. But it is also able to deal with other file types (PostScript, Texinfo, compressed, whatever...) provided you have the necessary tools installed.

  • psresize is another useful utility in Linux for dealing with undesired page sizes. If you are given a PostScript file that specifies A4 paper, but want to print it on US Letter-sized paper, then you can do:
    psresize -PA4 -pletter in.ps out.ps
    See the man page for more information.
  • Some of the newer printers have installation wizards for Windows that can be accessed through their web interfaces. I've had mixed success with the HP IPP installation wizards. The Xerox wizard (linked above) has worked well, though it pops up some unnecessary windows and is a bit on the slow side.

  • Windows 9x/Me users will likely have to install software on their machines in order to print directly to these printers. HP and Xerox have such software available for download from their respective support websites, but who uses these OSes anymore?

  • For linux users setting up new machines, CUPS at least for recent distros is the default printing system (unless upgrading from an older distribution, in which case LPRng may still be in use).  Given an appropriate PPD file, CUPS is capable of utilizing various print options, such as tray selection and duplexing, or at least you can create different queues with different options to a single printer.

  • There are other potentially useful printers around that are not catalogued here. Some are STAR printers out of the mainstream (like in 1006D), and some belong to other groups in the physics department.

Quick (?) start guide for visitors with laptops

So you brought a laptop to BNL… and the first thing you want to do is get online, right?
Ok, here's a quick (?) guide to getting what you want without breaking too many rules.

Wired Options:

  • Visitors' network: Dark purple jacks (usually labeled VNxx) are on a visitors' network and are effectively outside of the BNL firewall. They support DHCP and do not require any sort of registration to use. Being outside the firewall can be advantageous, but will prevent you from
    using some network services within BNL (printing, for instance). (The rest of this page is largely irrelevant if you are using the visitors' network.)

  • BNL network: If it isn't dark purple (and it isn't a phone jack) then it is on the BNL network, which supports DHCP on most subnets. (NB. The 60/61 subnet (available in parts of 1006, including the WAH) has a locally managed DHCP server -- contact Wayne Betts to be added to the access list). All devices on the BNL networks are required to be registered based on the MAC address that is unique to each network interface. To help enforce this policy, if you request a DHCP
    address from an unregistered node, you will be assigned a restricted address. With a restricted IP address, your web browser will be automatically redirected to the BNL registration page, and you will be unable to surf anywhere else until you are registered.

    When registering a laptop, fill in "varies" for the location fields. For the computer name field, I recommend using "DHCP Client" (unless you have a static IP address of course).

    Previously registered users are encouraged to verify and update their registration information by going to http://register.bnl.gov from the machine to be updated.

    There you can also find out more about the registration system and find links to some useful information for network users.

     

Using the 1-189 Conference PC

If you'd like to take advantage of the PC or the projector located in Building 510 Room 1-189 (STAR Conference Room) then you will need to know the following:

  • You must turn on the projector to see the login panel
    The default screen is the projector, the terminal will not display anything until the primary terminal is on.
  • The conference user password is: talkmeet
  • Upon logging in EVO will automatically start up and login to a special user account for this room. 
  • Use the phone bridge for the audio - it is much clearer and more stable (BNL has an EVO reflector at x6100) - be sure EVO is on mute (default)
  • You can then use the wireless keyboard and mouse to present on the projector, which is permanently attached to the conference PC. 
    Please,  do not disconnect the projector from the conference PC.
  • Please, do not move the camera by hand - the camera control is placed as a shortcut on the desktop
  • You are expected to upload your talks to a Drupal-like agenda and display from the main computer

Firefox, Microsoft Office, and Open Office are also installed for presentation use.

To know:

  • THIS COMPUTER HAS SOFTWARE INSTALLED THAT PREVENTS LONG TERM FILE STORAGE.
    IF THE MACHINE IS REBOOTED, YOUR FILES WILL BE DELETED FROM THE HARD DRIVE!
    If you intend to keep files arround for a while, please keep it on drive T:
  • The above applies to any software you may attempt to install - if you need a software, please contact the STAR user support team.
  • Credentials are wiped out upon termination log-off. If you intend to carry on a login to a Web site, you need to stay logged on.

 

Windows

This area is intended to provide information for STAR members to assist in configuring and using typical desktop/laptop PCs at BNL.

  Windows 2000/XP and Scientific Linux/Redhat Enterprise Linux are the preferred Operating Systems within STAR at BNL for desktop computing, though there is no formal requirement to use any particular OS.

  These pages are intended to be dynamic, subject to the constantly changing software world and user input.   Feedback from users -- what you find indispensable; what is misleading, confusing or flat-out wrong; and what is missing that you wish was here -- can help to significantly increase the value of these pages.

  Additional pages that are under consideration for creation:

  • Windows installation checklist (the basic software and configuration that should probably be on every Windows PC)
  • Linux installation checklist
  • Common Linux details and useful links, such as Linux equivalents to software for Windows.
  • Resources specific to the experiment operations (eg. common DAQ NFS mounts)
  • Publically useable terminals

Cygwin installation and tips

To quote from the Cygwin website:  "Cygwin is a Linux-like environment for Windows."

The Linux-like nature is quite comprehensive...  You can *almost* forget that you are using a Windows OS -- most utilities and software that you are familiar with from your Linux experience are available in Cygwin.  For example, the Cygwin distribution has available an openssh client (and the server too, but I don't recommend you use it), PostScript and PDF viewers and editors, compression (eg. zip) utilities, software development tools and X Windows packages (more on X below). 

Using the Cygwin X server

An example of Cygwin's usefulness and cost-saving potential is the X server.  The Cygwin X server is, in most cases, easy and convenient to use in place of commercial X servers such as Hummingbird Exceed.  Here is the short version for those familiar with Cygwin installations:
  1. You need the xorg-x11-base and X-startup-scripts packages (and whatever dependencies they have, which the setup routine should solve for you).  You'll probably also want the xwinclip package.  All of these are in the X11 Category in the Cygwin Setup.
  2. Execute "startxwin.bat" (in <cygwin_root>/usr/X11R6/bin/).  That will start a stand-alone X Server and an xterm with a cygwin shell.   Edit this batch file as you see fit -- it includes documentation for a number of options. 
  3. If you are displaying windows from a remote session over ssh, be sure you have X tunneling enabled in your ssh client configuration.  Please do not try to open up your X server to the entire world with anything like "xhost +".  That is a *VERY BAD IDEA*.
  4. In light of step 3 above:  If you have a local firewall that asks about blocking access to the Xserver, you can usually block it without a problem -- if you have X forwarding enabled and working, then you are usually ok.  (If you believe a localhost-based firewall is interfering with X, try allowing only connections from the loopback/localhost address (127.0.0.1)).
Long version:  Walkthrough of a Cygwin installation (MS Word doc).

Subsidiary recommendation:

There is a handy tool for initiating shell connections to remote hosts (such as via ssh) and starting the Cygwin X server called Mortens Cygwin X-Launcher.  Coming soon (?): screenshots of the X-Launcher configuration that are most likely to be useful...

Installation Tip:

A Cygwin mirror is available at http://mirror.bnl.gov/cygwin/ making the installation go quite quickly if you are at BNL.  This is quite handy for the cygwin installation and any subsequent use of the setup utility.  One potential catch for onsite users -- even if you intend to use the local mirror, you must still configure a BNL proxy server during Setup, as shown in  this walkthrough of a Cygwin installation (MS Word format).
Please send comments, corrections and suggestions to Wayne Betts: wbetts {at} bnl.gov

FAQS and Tips that don't fit well elsewhere

Software Site Licenses:

Do we have a site license for software package XYZ?

The answer is (almost) always: No!
Neither STAR nor BNL have site licences for any Microsoft product, Hummingbird Exceed, WinZIP, ssh.com's software or much of anything intended to run on individual users' desktops. Furthermore, for most purposes BNL-owned computers do not qualify for academic software licenses, though exceptions do exist.

FAQ: PDF creation:

How can I create a file in pdf format?

Without Adobe Acrobat (an expensive bit of software), this can be a daunting question. I am researching answers, some of which are available in my Windows software tips. Here is the gist of it in a nutshell as I write this -- there are online conversion services and OpenOffice is capable of exporting PDF documents.

FAQ: X Servers:

What X server software should I use in Windows?

I recommend trying the X Server that is available freely with Cygwin, for which I have created some documentation here: Cygwin Tips. If you can't make that work for you, then I next recommend a commercial product called Xmanager, available from http://www.netsarang.com. Last time I checked, you could still download a fully functional version for a time-limited evaluation period.

TIP: Windows Hibernation trick:

Hibernate or Standby -- There is a difference which you might find handy: 
  • "Standby" puts the machine in a low power state from which it can be woken up nearly instantly with some stimulus, such as a keystroke or mouse movement (much like a screensaver) but the state requires a continuous power source.  The power required is quite small compared to normal running, but it can eventually deplete the battery (or crash hard if the power is lost in the case of a desktop).
  • "Hibernate" actually dumps everything in memory to disk and turns off the computer, then upon restarting it reloads the saved memory and basically is back to where it was.  While hibernating, no power source is required.  It can't wake up quickly (it takes about as long as a normal bootup), but when it does wake up, (almost) everything is just the way you left it.  One caveat about networking is in order here:  Stateful connections (eg. ssh logins) are not likely to survive a hibernation mode (though you may be able to enable such a feature if you control both the client and server configurations), but most web browsing activity and email clients, which don't maintain an active connection, can happily resume where they left off.

Imagine:  the lightning is starting, and you've got 50 windows open on your desktop that would take an hour to restore from scratch.  You want to hibernate now!  Here's how to enable hibernating if it isn't showing up in the shutdown box: 
Open the Control Panels and open "Power Options".  Go to the "Hibernate" tab and make sure the the box to enable Hibernation is checked.  When you hit "Turn Off Computer" in the Start menu, if you still only see a Standby button, then try holding down a Shift key -- the Standby button should change to a Hibernate button.  Obvious, huh?

For the curious:
There are actually six (or seven depending on what you call "official") ACPI power states, but most motherboards/BIOSes only support a subset of these.  To learn more, try Googling "acpi power state", or you can start here as long as this link works.  (Note there is an error in the main post -- the S5 state is actually "Shutdown" in Microsoft's terminology). 
From the command line, you can play around with these things with such straighforward commands as:
%windir%\System32\rundll32.exe powrprof.dll,SetSuspendState 1
Even more obvious, right?  If you like that, then try this on for size.

TIP: My new computer is broken!:

It's almost certainly true - your new computer is faulty and the manufacturer knows it!  Unfortunately, that's just a fact of life.  Straight out of the box, or after acquiring a used PC, you might just want to have a peek at the vendor's website for various updates that have been released.  BIOS updates for the motherboard are a good place to start, as they tend to fix all sorts of niggling problems.  Firmware updates for other components are common as are driver updates and software patches for pre-installed software.  I've solved a number of problems applying these types of updates, though it can take hours to go through them thoroughly and most of the updates have no noticeable effect.  And it is dangerous at times.  One anecdote to share here -- we had a common wireless PC Card adapter that was well supported in both Windows and Linux.  The vendor provided an updated firmware for the card, installed under Windows.  But it turned out that the Linux drivers wouldn't work with the updated firmware.  So back we went to reinstall a less new firmware.  You'll want to try to be intelligent and discerning in your choices.  Dell for instance does a decent job with this (your Dell Service Tag is one very useful key here), but still requires a lot from the updater to help ensure things go smoothly.  This of course is in addition to OS updates that are so vital to security and discussed elsewhere.



Please send comments, corrections and suggestions to Wayne Betts: wbetts {at} bnl.gov

Networking Software

Networking
Software

  • PuTTY:
     This is the preferred SSH client for Windows.  It is free, easy to use
    and well maintained for both security and bug issues.
     (As with everything, it is only "maintained" if you regularly check
    for updated versions!)
     Please note that most other SSH clients for Windows are NOT free for
    use on government computers or in the pursuit of lab business, though
    they might function just fine without payment.

  • WinSCP:  This is a fine graphical SFTP and SCP client utility with some additional features built in.

  • X servers (no, Exceed doesn't make the cut because of the high monetary cost):

    • Cygwin:  Please look at the separate Cygwin page for information on installing and configuring the Cygwin X server.

    • Xmanager:  I
      recommend that you use the Cygwin X server, but if you find something
      that it can't handle, then this is the recommended alternative. 
      It isn't free (but it does have fully functional time-limited
      evaluation license if you want to try it out.) 
      It is much cheaper than Exceed and seemingly just as capable, but
      without quite as much overhead. 
      I'm particularly interested in hearing about X Server alternatives, so
      let me know if you have a favorite!

  • Alternatives to Microsoft's Internet Explorer and Outlook Express:

     As
    the leading web browser and mail client, these two apps are the target
    of prolific viruses, trojans, malware and other nasties. 
    In addition to avoiding many of these, you may also like some of the
    features available in the alternatives (eg. tabbed browsing is a
    popular feature unavailable in IE). 
    Four alternatives are in common use (three of them share much of the
    same code-base -- Mozilla, Netscape Navigator and Firefox). 
     This review
    might help sort you out the differences.
     As with anything, your preference is yours to decide (and also, as
    with everything else here, feature and security updates are released
    quite often, so you might try to check for new versions regularly): 
    They are listed here from highest recommendation to lowest:

    1. Firefox/Thunderbird: 
      Though frequently mentioned as a pair, Firefox and Thunderbird are
      stand-alone applications. 
      Firefox is a web browser, and Thunderbird is an email client. 
      "Stand alone" here means that these can be installed separately from
      each other. 
      You can configure them to work with alternative software as you wish
      (eg. use Firefox for surfing, but set Outlook as your default mail
      client). Actually, you can generally mix and match pieces from all of
      these alternatives, but most of them start out with defaults tied to
      their suite companions. 
      Slight thumbs up to Firefox over the other alternatives because it has
      almost every feature found in the corresponding Mozilla suite, plus
      additional add-ons. 
      Vast numbers of independently produced add-ons and customizations are
      available as well.
    2. Mozilla Suite: 
      A suite that includes the big three:  a browser, email client and HTML
      editor. 
      This is a fine alternative, but as a browser alternative, this author
      gives the bigger thumbs up to its sibling, Firefox, listed above.
    3. Opera. 
      It is available in a free version with a "branding" bar that contains
      advertisements, or you can buy the product to remove this minor
      annoyance.  (Branding/non-branding examples.)
    4. Netscape: 
      The Netscape suite includes a browser (Navigator), email client (Mail),
      HTML editor (Composer) and other tidbits. 
      Of the three Mozilla-based browsers, this is probably the least used
      and has the most extraneous stuff thrown in, which is one of several
      reasons it gets last place in this list.
        It is good enough to recommend, but just not quite as highly as the
      others.

  • Java, WebStart, JRE, J2RE, JSDK,
    Microsoft VM and all that Jazz...: The author of this segment finds
    this to be very puzzling and sometimes frustrating stuff to understand,
    keep up with, and especially to try to explain clearly and succinctly. 
    <Melodrama> Imagine Sun, IBM and Microsoft all walked into a bar
    and had a few drinks. Heck, let Netscape walk in a few minutes later
    for good measure. 
    Fifty states' attorneys general plus the US AG and DOJ are to act as a
    referee. 
    Now imagine that you, a mere passerby on the street were harangued into
    cleaning up the inevitable bar fight, complete with broken bottles,
    flying bar stools and blood everywhere all while it is still going on. 
    That's not even close to how awful it is...</Melodrama>   Details
    to be filled in here!

  • OpenAFS, MIT Kerberos, Wake and Leash: Details to be filled in here!

  • Google Toolbar :
    This is a very convenient interface to initiate Google searches, plus a
    decent pop-up blocker. Unfortunately, it is only available for Internet
    Explorer (though other browsers may support similar features natively).

Please send comments, corrections and suggestions to Wayne Betts: wbetts {at} bnl.gov

Office applications and productivity software

Productivity software and viewers/utilities for various file types

  • OpenOffice -- Free and available on multiple platforms.  Perhaps the single best reason to use it is that it natively creates PDF format.  In addition to its own formats, it can read (and write) MS Word, Excel and PowerPoint files (usually -- sometimes formatting details go haywire, but they are constantly updating it.)

  • Adobe Reader -- Used for viewing PDF documents.  (You will probably want to install it with the very useful text search feature.) (Linux users can try xpdf as an alternative which is part of many distributions.)

  • Ghostscript and GSview: PostScript interpreter and viewer (and PDF too) that you probably want to have.

  • Online Document Conversion Services:  Neevia Technology and CERN Document Conversion Service both have file convertors that allow you to submit a variety of common (and uncommon) file formats in small numbers and produce files in different formats (PDF being of most interest probably).  Though not convenient for many files or very large files (and certainly inappropriate for confidential or non-public information), they are good to know about.  (Don't forget -- OpenOffice is able to export documents in PDF format too and handles a lot of file types.)

  • Graphics and Image Manipulation software:  The GIMP and ImageMagick are both quite capable tools available for free for multiple platforms.  Perhaps not perfect replacements for Adobe PhotoShop, but pretty darn good.  (If you are a PhotoShop veteran, then you'll have to spend some time learning the ropes, but it will probably be worth it.)

  • Compression Utilities:  WinZip is not free (though many, many people use it without payment).  Fortunately, there are freeware alternatives.  For instance:
    • 7Zip:  This is the current recommendation of this page, the reasons for which may be included in the future..
    • FreeZip (but not "FreeZip!" which is reported to contain spyware and/or adware)
    • ZipCentral
    • ZipItFast
    • ExtractNow
    • CAMUnZip
    • ZipWrangler
    • Freebyte Zip

  • If you've ever spent a few minutes waiting for MS Windows Search function to find a file on your system, then you might find the following can save you some time. The basic idea is similar to most internet search engines: index your files (while the computer would otherwise be idle so as not to slow things down for the user) and then consult the indexes when a search is requested:

    • Yahoo! Desktop Search:  This is a free version of a well respected product from X1 with a few features removed, such as indexing of remote drives, Eudora and Mozilla-based email.
    • Google Desktop Search:  Use Google's Desktop Search to quickly search for files on your computer using an indexing system much like Google's web indexing.  Not all file types are supported, but most common ones are, such as Outlook mail, MS Office documents and so on.
    • Copernic Desktop Search:  This is similar to the Google Desktop Search, but appears to be a bit more capable, though as of this writing I have not had time or cause to test it much.  User comments would be appreciated.
    • Windows 2000 and XP include an "Indexing Service" which (according to Microsoft) is "a base service [...] that extracts content from files and constructs an indexed catalog to facilitate efficient and rapid searching."  To configure the Indexing Service open Control Panels -> Administrative Tools -> Computer Management.  In the left pane, click the plus sign next to "Services and Applications", then right-click on the "Indexing Service" icon.  In the popup menu, select "All Tasks | Tune Performance".  The "Indexing Service Usage" dialog box will appear.  The Indexing Service is actually quite customizable, though doing so can add significantly to the resources required by the service.  A warning: it can eat up a surprising amount of disk space to maintain the indexes.  It has sped up basic searches for this author, but your mileage may vary in both search efficiency gains and overall performance penalty.

  • Cygwin: Cygwin has a number of utilities for handling, viewing and transforming file formats, so I have a separate page of Cygwin tips
  • Multimedia Players (work related, of course!)

    Pick one.  Use it.  If you find a format it doesn't support, try a different one, or go to the vendor's site and look for a download of an update or add-on (plug-in, patch, codec, etc.) for your format.  This isn't the place to go into the details, but some quick thoughts are included here:
    • Microsoft's Media Player -- you've almost certainly already got it, so why not use it? 
    • Real Player:  complaint -- by default it runs background processes continuously, pops up annoying little messages and practically begs you to register it, though it isn't nessecary for full functionality..  It isn't a big deal to disable these annoyances, but why should you have to?
    • Winamp:  There is a free version and an inexpensive "Pro" version that has CD burning.  It has been up and down over the years, with some versions much quirkier than others.  Currently it seems to be on par with the rest.
    • Apple's iTunes:  Though intended to suck you into Apple's music store, you can use the application without using the store.  In keeping with most Apple stuff, it seems to be well liked by those who like it.  Enough said.


Please send comments, corrections and suggestions to Wayne Betts: wbetts {at} bnl.gov

Performance and Security enhancement

Utilities for Security and Performance

If your computer seems to be running slower than it used to, pop-up advertising is appearing at an alarming rate, your web browser's settings keep changing in undesired ways or you just want a better idea what your computer is up to (eg. "What the heck is PRPCUI.exe?"), here are some resources for understanding what's going on and making things better, presented in roughly the order from those that require the lease detailed understanding to the most:
  • Ad-Aware:

    Ad-Aware was, not very long ago, *the* place to start for malware detection and removal, with the added bonus that it was free.  Alas, recent versions of Ad-Aware (even the Personal version) are no longer licensed quite so freely (let's be clear -- a DOE-owned computer shouldn't have it installed without a paid license.)  It is still free for personal use, so it is highly recommended for home and personal laptop use, though it may not be keeping up with the constantly expanding field, which is a common problem with this type of software.  One thing to keep in mind:  you must be sure to keep your definitions up-to-date, just like a virus scanner, in order to get the most benefit.
  • Spybot - Search&Destroy:

    This is the historical alternative to Ad-Aware, with similar good results "in the early days", but it too may be failing to keep up.  Unlike Ad-Aware, it's license is quite liberal, so it can be installed as desired.  It has an "Advanced" mode, with a variety of additional tools beyond the basic malware scanner, (but keep in mind that some of these features are indeed "Advanced" and not to be played with lightly).  Broken record time:  you must be sure to keep your definitions up-to-date.  You should also consider using the "Immunize" feature to prevent some infestations, and to blacklist some sites known to host various forms of malware.

     

  • Microsoft's Malicious Software Removal Tool:  This is a regularly updated (but far from comprehensive) online removal tool for Windows 2000 and Windows XP.  It isn't a bad idea to run this scanner once a month or whenever you suspect you might have caught "something".

     

  • Microsoft's AntiSpyware Beta:  Though called a Beta product, this is essentially a re-GUIed and slightly modified version of a long standing and respected commercial product that Microsoft recently purchased.   Some recent tests by more-or-less independent testers have shown this tool to be better even than the old reliables, Ad-Aware and SpyBot.

     

  • Defragmenting your hard drive is something to put on the calendar 2-4 times a year.  Because Windows' built-in defragmenter seems especially slow, and modern disk drives hold so much, this is something usually left running overnight.  Third-party alternatives exist that may do a better job in various ways.  Let's hope I get around to listing one or two here in the not-too-distant future...

     

  • CrapCleaner:  This is a system optimization tool for removing unnecesary temporary files and registry entries. The default installation creates a "Run CCleaner" entry in the Recycle Bin's context (right-click) menu.
  • Monitoring startup activity and services.

    Programs that start when you boot or login to your computer can be big performance drains, in addition to doing unwanted things.  The following may help you understand and control what's going on.  (N.B. Some of the following are capable of rendering your system unusable if not handled with care!  They may require significant understanding of Windows' internals to be most useful):

     

    • StartUp Monitor and Startup Control Panel.  These are separate utilities, but they are from the same source and complement each other nicely.  (The author of these has additional utilities that you may find worthwhile as well.)
    • msconfig.exe:  This is Windows' very own "System Configuration Utility", with which one can look at and configure system startup paramters and files, which is especially useful to see the effects of individual changes. You can hose things up quite good in here however, so be careful!
    • services.msc:  This provides a Management Console to configure the startup of various registered services.  This is useul for disabling unnecessary or unused Windows services.  A potentially informative feature in this Console is the "Description" column, though it can still be quite cryptic (or blank).  
    • Merijn.org's website provides several downloads that you might find useful, such as HijackThis ("a general homepage hijack detector and remover"), CWShredder (CoolWebsearch removal tool) and StartUpList ("way better than msconfig")
    • BlackViper.com
    • http://www.sysinfo.org/ (slow site) 
    • Security Task Manager
    • http://www.sysinternals.com
    • HijackThis
    • BHODemon
  • Pop-up Blockers

Pop-up blocking software is increasingly unnecessary because other tools are including their own pop-up blockers.  Mozilla/Firefox for instance have built in pop-up blockers.  Internet Explorer has a pop-up blocker added with Windows XP SP2.  The Google Toolbar (recommended in the "Networking Software" recommendations) has a pop-up stopper as well.  Still, you might fight some utility in the products available from the PanicWare website.  Versions of their Pop-Up Stopper FREE Edition served this author quite well for over a year, but as I said above, it no longer seems as essential as in the pastthe basic functionality has been supplanted by features in other software.
Microsoft Office updates are a combination of security fixes, bug fixes and new features.  Though not emphasized as much as Windows Updates, the security fixes for Office are of similar importance.  Unfortunately, using the online updating system usually requires an installation CD that matches your product (for instance, "Office XP Pro" disks are not acceptable for updating "Office XP Standard".)  Many people, for a variety of reasons, don't have their original installation CD(s).  If you do not have an acceptable installation CD available then the online product update scan can still be used to determine what updates are applicable.  Then you can usually download full updates and apply them manually without the installation media.  (Browse for the downloads that match your product -- most are in self-extracting executable format.)  

  • Clock keepers

  • Multi-desktop software

Other resources



Please send comments, corrections and suggestions to Wayne Betts: wbetts {at} bnl.gov

Required sofftware and configuration for Windows PCs at BNL

BNL-specific requirements and configuration for networked Windows computers:

  • A file and real-time virus scanner with up-to-date virus patterns/definitions is REQUIRED!  (***Cyber-Security requirement***)

      Information about the BNL-supported products from TrendMicro is available from the BNL ITD group: TrendMicro at BNL.   It is critical that any anti-virus product receive regular updates (daily or even more often), which is sometimes difficult for mobile machines on a variety of networks.   Four similar products are available to try to meet the demands of our diverse environment:

    Windows desktops that reside on the BNL internal networks are best served by TrendMicro's basic OfficeScan product.   It has a master server inside the BNL firewall from which it receives updates and to which it reports infections.  Every Windows desktop system at BNL should be using this product, with very few exceptions.  You can
    click here to go to the online install the OfficeScan product.  (You'll need administrator privileges on your system for the installation.)

    Laptop users with wireless networking are encouraged to use a newer OfficeScan version that has a firewall module and is able to recieve virus pattern updates from multiple sources -- so it can roam around on- and off-site and usually still reach an update server.  This OfficeScan version is also more capable of cleaning up some trojans and malware than the desktop version.   To install it in the standard way, you must already be on the BNL external wireless network and go here.   Repeat: you must be on the "BNLexternal" wireless network to use that link.

    BNL employees' personal home computers are permitted to use the PC-cillin product, which gets its updates from servers that are outside the BNL firewall (and it does not report infections to anybody at BNL).  PC-cillin includes a firewall module (OfficeScan does not) and PC-cillin has more (but quite limited) spy-ware and ad-ware detection capabilities.

    If you are running a Windows *Server* OS (if you are unsure, then you almost certainly are not!), then there is yet another option, for which you will need to contact ITD (help desk at x5522 or Jim McManus directly at x4107).

    or those readers to whom none of the above apply, which is to say, computers not owned or used primarily at BNL or by BNL employees, I recommend (though can offer no significant assistance with) the following three free anti-virus products about which we (Wayne / Jerome) have read or heard good things:

    1. AVG Anti-Virus     - JL tried for 3 months, worked great but had conflict with fingerprint driver (thought to be a malicious script when activated)
    2. COMODO Free        - JL tried this for years and it works just fine and appears to be a great product considering the cost (none :-) ). The free version is for home users only so NOT to be installed on a BNL system for sure (usually the case of most Free AV).
    3. Microsoft Sec. E   - Microsoft Security Essentials is new on the market but starts doing a good job and supports Windows 7, Vista and XP

      Other anti-virus resources available include online scanners, such as HouseCall from TrendMicro and Symantec's Security Check.   Most major anti-virus vendors have something similar.   Relying on these online scanners as you primary defense is unwise.   In addition to the inconvenience of manually performing these scans, you really need a product monitoring your system at all times to prevent infections in the first place, rather than trying to clean up afterwards.   But since no two products catch and/or clean the same set of problems, occaisionally using a second vendor's product can be useful.

     

  • Windows Critical Updates/SUS (***Cyber-Security requirement***)

      Windows systems must be regularly patched with "critical" updates.  Unfortunately, the BNL firewall and proxy configurations can interfere with the Windows Automatic Update feature in Windows 2000/XP (though you can still use Windows Updates in Internet Explorer if you have the proxies configured correctly, see below for proxy info).  To help with this situation, BNL ITD has set up a Software Update Services server to locally host critical updates.  To use this service (which places a notification icon in the System Tray when updates are available), please click here for more information and installation instructions.  (It is quite easy, but you must have administrative privileges.)   You can manually apply Windows updates (critical and otherwise) using Internet Explorer --  go to the Tools menu and click on "Windows Updates", at which point it is straightforward.  Note that in many cases, the machine must be rebooted to complete the update process.
  • Logon Banner (**Cyber-Security requirement**)

      As required by the DOE, please install a logon banner for BNL-owned or BNL-based computers.  (This includes other OSes as well -- essentially anything that you can log into is required to post a banner if technically possible.)  Click here for more information about logon banners at BNL. To install the banner:  Windows NT/2000/XP click here (must be an administrator to insert the registry changes).  Window 95/98 click here instead.
  • MAC Registration (**Cyber-Security requirement**)

  All networked devices on the BNL internal networks are required to be registered.   (NB--- Please do not attempt to register your machine while using STAR's cygnusb wireless access points.)   More specifically, each network interface is to be registered -- one computer might have multiple network interfaces, each of which requires a separate registration.   That's because the registration is keyed on a specific string assigned to each network interface by the manufacturer that is supposed to be unique in the world.   It is known as a "MAC", "ethernet" or "hardware" address and each network interface has one. (Ie. You must create a separate registration entry for each network card you use on a system.)   For more information, or to update your registration information, click here.  This requirement applies to things beyond typical PCs, such as remote network power supplies, VME processors and other networked equipment.   If you have such equipment that you cannot register (typically because it doesn't run any sort of web browser), then please contact ITD (x5522) or Wayne Betts for assistance in registering the system.   While not necessary, if you have the capability to verify that the MAC you are registering is in fact yours (Windows hint:  "ipconfig /all" or Linux hint:  "ifconfig"), please do so.   Glitches in the system occaisionally fail to properly keep track of the realtime IP-to-MAC mapping, and you, the adaptable human, can perhaps avert the unfortunate situation of misregistration.
  • Proxy servers

    As per 2017/11, please use direct connection to the network while at BNL.
  • Security Scanning

  The BNL networks are routinely scanned for vulnerabilities by ITD, auditors and even sometimes malicious intruders.  The most prevalent scan is done using Nessus, which looks for common network services and many known vulnerabilities.  Any user with a web browser can initiate a new scan of his host machine and look at the most recent scan results for his IP address by going to http://scanner.bnl.gov/.   (NB. When it requests an email address to send the results, you must use an address ending in bnl.gov, or it will reject you.)   The results can be daunting to interpret, so please ask for assistance if you are unsure how to interpret or correct any results.   Some results are "false positives" or uncorrectable but necessary, in which case they can be marked as such in the database.

 


Please send comments, corrections and suggestions to Wayne Betts: wbetts {at} bnl.gov

Facility Access

A selection of tips on how to log to the RCF
facility. We hope to augment those pages and add
information as user request or need.

Getting a computer account in STAR

  1. Introduction
  2. Getting an account and performing work at BNL

 

Introduction

First of all, if you are a new user, WELCOME to the RHIC/STAR collaboration and experiment. STAR is located at Brookhaven National Laboratory and is one of the premier particle detectors in the world.

As a (new) STAR user, you will need to be granted access to our BNL Tier0 computing facility in order to have access to the offline and online infrastructure and resources. This includes accessing BNL from remote or directly while visiting us on site. Access includes access to data, experiment, mailing lists, desktop computer for visitors to name only those. As a National Facility under the Department of Energy (DOE) regulations, a few steps are required for this to happen. Please, follow them precisely and make sure you understand their relevance.

Note:

The DOE requires proper credentials for anyone accessing a computing "resource" and expect such individual to keep credentials up-to-date i.e. in good standing. It is YOUR responsibility to keep valid credentials with Brookhaven National Laboratory's offices. Credentials include: being a valid and active STAR member, having a valid and active guest/user ID and appointment, having and keeping proper trainings. Any missing component would cause an immediate closure of access to computing resources.

In many cases, we rely on account name matching the one created at the RCF (for example, Hypernews or Drupal accounts need exact match to be approved) - this is enforced so we can accurately rely on the work already done by the RCF personnel and only base our automation on "RCF account exist and is active". The RCF personnel work with the user's office and other agencies to verify your credentials.


If you were a STAR user before and seek to re-activate your account, this page also has information for you.

 

Getting an account and performing work at BNL

Note that along the process of requesting either an appointment or a computing account implies a check from the facility and user office personnel of your good standing with RHIC/STAR as the affiliated experiment. Therefore, we urge you to follow the steps as described below


ALL USERS - Ensure/Verify you are affiliated to STAR in our records

Whenever you join a group affiliated with STAR, please
  • Ask your council representative that he/she sends your information to the collaboration's record keeping person (at this point in time, this person is Liz Mogavero).
    Note: Your council representative IS the one responsible for keeping the list of authors and active members at all times. We will not (and cannot) consider requests coming from other STAR members.
     
  • Pro-actively check the presence of your name and record in our Phone Book.
    Note: If you are not in our Phone Book, you are simply NOT a STAR user as far as we know as our PhoneBook is the central repository of active STAR members as defined by the STAR council representatives. 
     

New users in STAR

  1. Request a Guest appointment
    You must be sure you have a valid guest appointment with the BNL User Office.

    Note
    1: Requesting a Guest ID requires a procedure called “Foreign Visit and Assignment”. This procedure involves steps such as background checks with Counter Intelligence and approval from the Department of State. The procedure could take up to 60 days from the time it is started (sensitive countries may take 90 days).
    Note2 : If you have done this already and are a valid Guest, please go to to this section.

    • Go to the Guest Registration Form and complete the registration as instructed.
      • Purpose of Visit: likely "Research" but if you come for other purposes, chose as appropriate ("CRADA" or "Interview" may apply for example) 
      • Experiment/Facility:  "Physics Dept (RHIC/AGS)"
      • Facility Code: "RHIC"
      • Type of Research: "STAR"
      • Type of Access Requested: likely "Open Research" if you stated your visit purpose as "Research"
      • Subject Code for this Visit/Assignment: likely "General Physics"
         
  2. Be patient and wait for further instructions and the approval.
    • ONLY AFTER THE FIRST STEPS will you be able to proceed with the rest of the instructions below.
    • We will assume that you, from now on, have a valid Guest appointment and hold a Guest/BNL ID.
       
  3. Ensure you have the required and mandatory training
    You MUST take the Cyber Security training and course GE-CYBERSEC. This training is mandatory and access to the facility computing resources will NOT be granted without it.
    You are also requested to read the Personal User Agreement which describes your responsibilities, the reasonable use and scope of personal use of computing equipments. In recent years, the BNL User Office have requested for the form to be signed and returned for their records. Please, do not skip any of those steps.
     
  4. Request an RCF account
    To request a new RCF account, start here. The fields are explained on this instruction page.
    Note: There is a "Contact information" field which is aimed to be filled using an existing RHIC member (holder of a valid account and appointment) who can vouch for you. Put your council representative or team lead name there OR (in case of interview / CRADA etc...) the name of your contact and host at BNL. DO NOT use your own name for this field. DO NOT use the name a person who is NOT yet a STAR Member.
     
  5. Additional steps are described below.

Previously a STAR user

If you were a STAR user before and consulting those pages, it may mean that either
  1. you cannot remember how to login and need access but you are in good standing (all training valid, your BNL appointment is valid)
  2. you have let your training expire but you are a valid BNL guest (your appointment with BNL has NOT expired)
  3. your BNL appointment is about to expire or has expired not long ago
  4. you are a RHIC user (from another experiment), and now coming to STAR
The instructions follows:
  • First of all, please make sure you are in the STAR PhoneBook as indicated here.
    If you were a member of another experiment before, you will be joining STAR as either a member of an existing institution or joining as a new institution. All membership handling are the responsibilities of the STAR council (approval of new institutions) or your council representative. In both cases, we MUST find your name in our PhoneBook records.
     
  • Instructions for the several use cases above
    1. If you are in good standing but cannot remember your login information at the RCF facility, please see Account re-activation
       
    2. You have let your BNL training expire - likely, you have not renewed or taken GE-CYBERSEC training available from the training page (please locate in the list at the bottom the course named GE-CYBERSEC). Within 24 hours of the training being taken/renewed again, the privilege to access the BNL computing resources using your RCF account will be re-established (the process will be automatic).
       
    3. Your appointment is about to expire, or has expired not long ago - you will need to go to the Extension requests .
      The Guest Central interface will help identifying your status and appointment expiration. This form could be used by users already having a BNL Guest ID. If you have let your appointment expire for a long time however, this form may let you know (or not show your old BNL badge/guest ID at all). In such a case, you should consider yourself as a "New user" and follow the first set of instructions above.
      For an appointment renewal, the starting point will be the Guest Extension Form.
       
    4. If you were a user before and now coming to STAR, you will need to follow
    5. Additional steps are described below.

Additional steps for everyone

  1. Generate and upload your SSH keys to ensure secure login
    You may now read SSH Keys and login to the SDCC and following information in this section.
     
  2. Drupal access
    1. Log in to RCF node to verify your account username/password working
    2. Download 2-Factor Authentication app to your mobile device (application ranges from Google or Microsoft Authenticator, Duo Mobile, Authy, FreeOTP, Aegis, ...)
    3. Contact Dmitry Arkhipkin or Jerome Lauret on MatterMost (https://chat.sdcc.bnl.gov, choose "BNL Login" with your RCF username/password) to obtain the 2-Factor Authentication QR code.
    4. Use your RCF username/password + 2-Factor Authentication code (read from the app on your mobile device) to log in to drupal.
You may also be interested in All of those links are referenced on Software & Computing, the main page for Software and Computing ...
Wishing you a great time in STAR.

 

Account re-activation

The instructions here are for users who have an account at the RCF but have unfortunately let their BNL appointment expire or do not know how to access their (old) account.

Account expired or is disabled

First of all, please be sure you understand the requirements and rationals explained in Getting a computer account in STAR.
As soon as your appointment with BNL ends or expires, all access to BNL computing resources are closed / suspended and before re-establishing it, you MUST renew your appointment first. In such case, we will not provide you with any access which may include access to Drupal (personal account) and mailing lists.

The simplest way to proceed is to

  • Check that you do have GE-CYBERSEC training. You can do this by checking your training records.
    • If you do not, please take this training NOW as any future request will be denied until this training is complete
  • Send an Email to RT-RACF-UserAccounts@bnl.gov requesting re-activation of your account. Specify the account name (Unix account name, not your name) if you remember it. If you don't your full name may do. The RCF team will check your status (Cyber training, appointment status) and
    • if any is not valid, you will be notified and further actions will be needed.
    • If all is fine, they will re-activate your account after verifying with us that you are a valid STAR user. Please consult Getting a computer account in STAR for what this means ...

If your appointment has expired, you will need to renew it. Please, follow the instructions available here.

Chicken and Egg issue? Forgot your password but did not upload SSH keys

If your account is valid, so is your appointment  but you have not logged in the facility for a while and hence, are unable to upload your SSH keys (as described in SSH Keys and login to the SDCC and related documents) this may be for you.

You cannot access the upload page unless you have a valid password as the access to the RCF requires a double authentication scheme (Kerberos password + SSH key). In case you have forgotten your password, you have first to send an Email to the RCF at RT-RACF-UserAccounts@bnl.gov asking to reset your password, then thereafter go to the SSH key upload interface and proceed.

Drupal access

This page describes how you can obtain the access to the STAR drupal pages.

  1. Get a computer account in STAR, make sure you have a valid guest appointment and valid cyber training, then request a RCF account  

    https://drupal.star.bnl.gov/STAR/comp/sofi/facility-access/general-access

  2. Generate SSH keys and upload them to RCF     https://drupal.star.bnl.gov/STAR/comp/sofi/facility-access/ssh-keys
  3. (Optional) Log in to RCF node
    1. ssh xxx@rssh.rhic.bnl.gov    (xxx is your username on RCF, enter the passphrase for your SSH key)
    2. kinit     (enter your RCF password)
  4. Download 2-Factor Authentication app to your mobile device (application ranges from Google or Microsoft Authenticator, Duo Mobile, Authy, FreeOTP, Aegis, ...)
  5. Contact Dmitry Arkhipkin or Jerome Lauret on MatterMost to obtain your 2FA QR-code.   https://chat.sdcc.bnl.gov  Use "BNL Login" and your RCF account user/password to login
  6. Drupal login requires your RCF username, password and a 6-digit code you read from the 2-Factor Authentication app.
 

SSH Keys and login to the SDCC

How to generate keys for about every platform ... and actually be able to log to the SDCC

General

What you find below is especially useful for those of you that work on several machines and platforms in and out of BNL and need to use ssh key pairs to get into SDCC.

  • If you use Linux only or Windows only everywhere, all you need is follow the instructions on the SDCC web site and you are all set (see especially their Unix SSH Key generation page).
  • Otherwise this page is for you.

The findings on this web page are a combined effort of Jérôme Lauret, Jim Thomas, and Thomas Ullrich. All typos and mistakes on this page are my doing. I am also not going to discuss the wisdom of having to move private keys around - all I want to do is get things done.

The whole problem arises from the fact that there are 3 different formats to store ssh key-pairs and all are not compatible:

  • ssh.com: Secure Shell is the company that invented the (now public) ssh protocol. They provide the (so far) best ssh version for Windows which is far nicer than PuTTY. Especially the File Browser provided is so much nicer than the scp command interface. It is free for academic/university sites.
  • PuTTY: a free ssh tool for Windows.
  • OpenSSH: runs on all Linux boxes and via cygwin on Windows.

Despite all claims, OpenSSH cannot export private keys into ssh.com format, nor can it import ssh.com private keys. Public keys seem to work but this is not what we want. So here is how it goes:

[A] Windows: follow one of the instructions below

  1. PuTTY (Windows)
    1. Download puttygen.exe from the PuTTY download page. You only need it once, but it might be good to keep it in case you need to regenerate your keys.
    2. Start the program puttygen.exe
      • Under parameters pick SSH-2 (RSA) and 1024 for the size of the key in bits.
      • Then press the Generate button. You will be asked to move your mouse over the blank area.
      • Enter a passphrase in the referring fields. The passphrase is needed as it will correspond to a password. Make a mental note of it as keys will not be usable without it.
      • I recommend to save the "key fingerprint" too since you will need it at the SDCC web site when uploading your public key. Just save it in a plain text file.
        Note: You can always generate it later from Linux with ssh-keygen -l -f <key_file> but since you will need access to a Linux system to do this, it is important you keep a copy of this now so you could proceed with the rest of the instructions. The picture below shows where the important fields are
    3. Saving keys
      • Press Save Public Key. To not confuse all the keys you are going to generate I strongly recommend to call it rsa_putty.pub.
      • Next press Save Private Key. Type rsa_putty as a name when prompted. PuTTY will automatically name it rsa_putty.ppk. That's your private key.
        Don't quit puttygen yet. Now comes the important stuff.
      • In the menu bar pick Conversions->Export OpenSSH key. When prompted give a name that indicated that this is the private key for OpenSSH (Linux). I used rsa_openssh. There is no public key stored only the private. We will generate the public one from the private one later.
      • In the menu bar pick Conversions->Export ssh.com key. When prompted give a name that indicated that this is the private key for ssh.com. I used rsa_sshcom. Again, there is no public key stored only the private. We will generate the public one from the private one later.
    4. All done. Now you have essentially 4 files: public and private keys for putty and private keys for ssh.com and OpenSSH.
  2. Getting ssh.com to work (Windows):
    1. Here I assume that you have SSHSecureShell (client) installed, that is the ssh.com version. Open a DOS (or cygwin) shell. We now need to generate a public key from the private key we got from puttygen. Best is to change into the directory where your private key is stored and type: ssh-keygen2 -D rsa_sshcom . Note that the command has a '2' at the end. This will generate a file called rsa_sshcom.pub containing the public key. Now you have your key pair.
    2. Launch SSH and pick from the menu bar Edit->Settings.
      Click on GlobalSettings/UserAuthentication/Keys and press the Import button. Point to your public key rsa_sshcom.pub. The private key will be automatically loaded too. That's it. Press OK and quit SSH. We are not quite ready yet. We still have to generate and upload the OpenSSH key to SDCC.
  3. Getting keys to work with OpenSSH/Linux:
    1. Copy the private key rsa_openssh to a Linux box (cygwin on Windows works of course too).
    2. Set the permissions such that only you can read the private key file:
      % chomod 600 rsa_openssh
    3. Generate the public key with:
      % ssh-keygen -y -f rsa_openssh > rsa_openssh.pub
    4. Now you have the key pair.
    5. To install the key pair on a Linux box copy rsa_openssh and rsa_openssh.pub to your ~/.ssh directory.
      Important: the keys ideally will be named id_rsa and id_rsa.pub, otherwise extra steps/options will be required to work with them.  So, you are recommended to also do
      % mv rsa_openssh ~/.ssh/id_rsa  

      % mv rsa_openssh.pub ~/.ssh/id_rsa.pub

      All done.  Note that there is no need to put your key files on every machine to which you are going to connect.  In fact, you should keep your private key file in as few places as possible -- just the source machine(s) from which you will initiate SSH connections.  Your public key file is indeed safe to share with the public, so you need not be so careful with it and in fact will have to provide it to remote systems (such in the next section) in order to use your keys at all.
       

[B] Uploading the public key to SDCC:

  1. https://web.racf.bnl.gov/Facility/SshKeys/UploadSshKey.php
  2. Make sure you upload the OpenSSH public key. Everything else won't work.
    You need to provide the key fingerprint which you hopefully saved from the instructions above.  In case of OpenSSH based keys, you can re-generate the fingerprint with
    % ssh-keygen -Emd5 -l -f <key_file>

Note that forcing MD5 hash is important (default hash is SHA256 the RACF interface will not take). All done.
If you followed all instructions you now have 3 key pairs (files). This covers essentially all SSH implementations there are. Where ever you go, whatever machine and system you deal with, one key pair will work. Keep them all in a very save place.

 

[C] Done.What's next?

Uploading your keys to the SDCC and STAR SSH-key management interfaces

You need to upload your SSH keys only once. But after your first upload, please wait a while (30 mnts) before connecting to the SDCC SSH Gatekeepers. Basic connection instructions, use:

% ssh -AX xxx@sssh.sdcc.bnl.gov
% rterm

The rterm command will open an X-terminal on a valid STAR interactive node. If you do NOT have an X11 server running on your computer, you could use the -i options of rterm for interactive (non X-term based) session.

If you intend to logon to our online enclave, please check the instructions on Accessing The STAR Protected Network to request an account on the STAR SSH gateways and Linux pool (and upload your keys to the STAR Key SSH Management system).  Note that you cannot upload your keys anywhere without a Kerberos password (both the SDCC and STAR's interface will require a real account kerberos password to log in). Logging in to the Online enclave involves the following ssh connection:

% ssh -AX xxx@cssh.sdcc.bnl.gov
% ssh -AX xxx@stargw.starp.bnl.gov

A first thing to see is that SDCC gatekeeper is here "cssh" as the network is spearated into a "campus" side (cssh) and a ScienceZone side (sssh). For convenience, we have asked Cyber security to allow connections from "sssh" to our online enclave as well (so if you use sssh all the time, it will work).

For the requested an account online ... note that users do not request access to the individual stargw machines directly.  Instead, a shared user database is kept on onlcs.starp.bnl.gov - approval for access to onlcs grants access to the stargw machines and the Online Linux Pool.  Such access is typically requested on the user's behalf when the user requests access to the online resources following the instructions at Accessing The STAR Protected Network, though users may also initiate the request themselves. 

Logging in to the stargw machines is most conveniently done Using the SSH Agent, and is generally done through the SDCC's SSSH gateways.  This additional step of starting an agent would be removed whenever we will be able to directly access the STAR SSH GW (as per 2009, this is not yet possible due to technical details).

See also

To learn more, see:

 

Caveats, issues, special cases and possible problems

Shortcut links

 

SSH side effects

Please note that if you remote account name is different from your RCF account name, you will need to use

% ssh -X username@rssh.rhic.bnl.gov

specifying explicitly username rather as the form

% ssh -X rssh.rhic.bnl.gov

will assume a username defaulting to your local machine (remote from the BNL ssh-daemon stand point) user name where you issue the ssh command. This has been a source of confusion for a few users. The first form by the way is preferred as always work and removes all ambiguities.

X11 Forwarding: -X or -Y ??

-X is used to automatically set the display environment to a secure channel (also called untrusted X11 forwarding) . In other words, it enables X11 forwarding without having to grant remote applications the right to manipulate your Xserver parameters. If you want ssh client to always act like with X11 forwarding, have the following line added in your /etc/ssh/ssh_config (or any /etc/ssh*/ssh*_config ).

ForwardX11 yes

-Y enables trusted X11 forwarding. So, what does trusted mean? It means that the X-client will be allowed to gain full access to your Xserver, including changing X11 properties (i.e. attributes and values which alters the look and feel of opened X windows or things such as mouse controls and position info, keyboard input reading and so on).  Starting with OpenSSH 3.8, you will need to set

ForwardX11Trusted yes 

in the client configuration  to allow remote nodes full access to your Xserver as it is NOT enabled by default.

When to use trusted, when to use untrusted

Recent OpenSSH version supports both untrusted (-X) and trusted (-Y) X11 Forwarding. As hinted above, the difference is what level of permissions the client application has on the Xserver running on the client machine.  Untrusted (-X) X11 Forwarding is more secure, but unfortunately several applications (especially older X-based applications) do not support running with less privileges and will eventually die and/or crash your entire Xserver session.

Dilema? A rule of thumb is that while using trusted (-Y) X11 Forwarding will have less applications problems for the near future, try first the most secured untrusted (-X) way and see what happens. If remote X applications fail with a errorssimilar to the below:

X Error of failed request: BadAtom (invalid Atom parameter)
  Major opcode of failed request: 18 (X_ChangeProperty)
  Atom id in failed request: 0x114
  Serial number of failed request: 370
  Current serial number in output stream: 372

you will have to use the trusted (-Y) connection.

Per client / server setup?

Instead of a system global configuration which will require your system administrator's assistance, you may create a config file in your user’s home directory (client side) under the .ssh directory with the following line $HOME/.ssh/config

ForwardX11Trusted yes 

But it gets better as the config file allows per host or per-domain configuration. For example, the below is valid

Host *.edu
	ForwardX11 no
	User jlauret

Host *.starp.bnl.gov
	ForwardX11 yes
    	Cipher blowfish
	User jeromel

Host orion.star.bnl.gov
     ForwardAgent yes
     Cipher 3des
     ForwardX11Trusted yes

Host what.is.this
    User exampleoptions
    ServerAliveInternal=900
    Port 666
    Compression yes
    PasswordAuthentication no
    KeepAlive yes
    ForwardAgent yes
    ForwardX11 yes
    RhostsAuthentication no
    RhostsRSAAuthentication no
    RSAAuthentication yes
    TISAuthentication no
    PasswordAuthentication no
    FallBackToRsh no
    UseRsh no

As a side note, 3des is more secure thank blowfish but also 3x slower. If speed and security is important, use at least aes cypher.

Kerberos hand-shake, How to.

OK, now you are logged to the facility gatekeeper but any sub-sequent login would ask for your password again (and this would defeat security). But you can cure this problem by, on the gatekeeper, issue the following command (we assume $user is your user name)

% kinit -5 -d -l 7d $user

-l 7d is used to provide a long life K5 ticket (7 days long credentials). Note that you should afterward be granted an AFS token automatically upon login to the worker nodes on the facility. From the gatekeeper, the command

% rterm

would open a terminal from the least loaded node on the cluster where you are allowed to log.

Generic (group) accounts

Due to policy regulations, group or generic accounts login cannot be allowed at the facility unless the login is traceable to an individual. The way to log in is therefore to

  • Log to the gatekeeper using SSH keys under your PERSONAL account as described at SSH Keys and login to the SDCC
  • kinit -5 -4 -l 7d $gaccount
  • In case of wide use generic account, one more jump to a "special" node will be necessary. For starreco and starlib for example, this additional gatekeeper node is rcas6003. From there, login to the rest of the facility could be done using rterm as usual (at least in STAR)

Special nodes

This section is about standing on one foot, tapping on to of your head and chanting a mantra unless the moon is full (in such case, the procedure involves parsley and sacrificial offerings). OK, we are in the realm of the very very special tricks for very very special nodes:

  • The rmine nodes CANNOT be connected to anymore. However, one can use rsec00.rhic.bnl.gov has a gatekeeper, using your desktop keys and then jump from there to the rmine nodes.
    Scope: Subject to special authorization.
  • The test node aplay1.usatlas.bnl.gov cannot be accessed using a Kerberos trick. Since there are two HOPs from your machine to aplay1, you need to use the ssh-agent. See instructions on the Using the SSH Agent help page.

K5 Caveats

  • If you log to gatekeeper GK1 for your personal account, you will need to chose another gatekeeepr GK2 for your group account login. This will allow not interference of Kerberos credentials.
  • Whenever you log in a gatekeeper and you know you had previously obtained Kerberos credentials on this gatekeeper, you should ensure the destruction of previous credential to avoid premature lifetime expiration. In other words, -l 7d will NOT give you a 7 days lifetime K5 ticket on a gatekeeper where previous credentials exists. To destroy previous credentials, be sure
    1. you do not have (still) opened windows using the credential. Check this by issuing a klist and observe the listing. Valid credentials used in opened session would look like this
      Valid starting     Expires            Service principal
      12/26/06 10:59:28  12/31/06 10:59:28  krbtgt/RHIC.BNL.GOV@RHIC.BNL.GOV
             renew until 01/02/07 10:59:25
      12/26/06 10:59:30  12/31/06 10:59:28  host/rcas6005.rcf.bnl.gov@RHIC.BNL.GOV
             renew until 01/02/07 10:59:25
      12/26/06 11:11:48  12/31/06 10:59:28  host/rplay43.rcf.bnl.gov@RHIC.BNL.GOV
             renew until 01/02/07 10:59:25
      12/26/06 17:51:05  12/31/06 10:59:28  host/stargrid02.rcf.bnl.gov@RHIC.BNL.GOV
             renew until 01/02/07 10:59:25
      12/26/06 18:34:03  12/31/06 10:59:28  host/stargrid01.rcf.bnl.gov@RHIC.BNL.GOV
             renew until 01/02/07 10:59:25
      12/26/06 18:34:22  12/31/06 10:59:28  host/stargrid03.rcf.bnl.gov@RHIC.BNL.GOV
             renew until 01/02/07 10:59:25
      12/28/06 17:53:29  12/31/06 10:59:28  host/rcas6011.rcf.bnl.gov@RHIC.BNL.GOV
             renew until 01/02/07 10:59:25 
      
    2. If nothing appears to be relevant or existing, it is safe to issue the kdestroy command to wipe out all old credentials and then re-initiate a kinit.

 

 

Using the SSH Agent

General

The ssh-agent is a program you may use together with OpenSSH or similar ssh programs. The ssh-agent provides a secure way of storing the passphrase of the private key.

One advantage and common use of the agent is to use the key forwarding. Key forwarding allows you to open ssh sessions without having to type (again) your passphrase. Below, we provide instructions on starting the agent, loading your keys and how to use key forwarding.

Instructions

Starting the agent

The ssh-agent is started as follow.

% ssh-agent

Note however that the agent will immediately display information such as the one below

% ssh-agent
SSH_AUTH_SOCK=/tmp/ssh-fxDmNwelBA/agent.5884; export SSH_AUTH_SOCK;
SSH_AGENT_PID=3520; export SSH_AGENT_PID;
echo Agent pid 3520;

 

It may not be immediately obvious to you but you actually MUST type those commands on the command line for the next steps to be effective.

Here is what I usually do: redirect the message to a file and source it from the shell like this:

% ssh-agent >agent.sh 
% source agent.sh

This will create a script containing the necessary commands, the source command will load the information. This assumes you are using sh. For csh, you need use the setenv shell command to define both SSH_AUTH_SOCK and SSH_AGENT_PID. A simpler approach may however be to use

% ssh-agent csh

while this will start a new shell, all will be defined in the new started shell (no sourcing needed). 
Now that after your agent is started, you will need to load a key.

Loading a key

The agent alone is not very useful until you've actually put keys into it. All your agent key management is handled by the ssh-add command. If you run it without arguments, it will add any of the 'standard' keys $HOME/.ssh/identity, $HOME/.ssh/id_rsa, and $HOME/.ssh/id_dsa.

TO be sure the agent has not loaded any id yet, you may use the -l option to ssh-add. The below is what you should have at first.

% ssh-add -l
The agent has no identities.

 

To load you keys, simply type

% ssh-add
Enter passphrase for /home/jlauret/.ssh/id_rsa:
Identity added: /home/jlauret/.ssh/id_rsa (/home/jlauret/.ssh/id_rsa)

 

To very if all is fine, you may use again the ssh-add command with the -l option. The result should be different now and similar to the below (if not, something went wrong).

% ssh-add -l
1024 34:a0:3f:56:6d:a2:02:d1:c5:23:2e:a0:27:16:3d:e5 /home/jlauret/.ssh/id_rsa (RSA)

 

Is so, all is fine.

Agent forwarding

Two conditions need to be present for agent forwarding to function:

  • The server need to be set to accept forwards (enabled by default)
  • You need to use the ssh client with the -A option

Usage is simply

 

% ssh -A user@remotehost

 

And that is all. For every hop, you need to use the -A option to have the key forwarded throughout the chain of ssh logins. Ideally, you may want to use -AX (where "X" enabled X11 agent forwarding).

Agent security concern

The ssh-agent creates a unix domain socket, and then listens for connections from /usr/bin/ssh on this socket. It relies on simple unix permissions to prevent access to this socket, which means that any keys you put into your agent are available to anyone who can connect to this socket. BE AWARE that root especially has acess to any file hence any sockets and as a consequence, may acquire access to your remote system whenever you use an agent.

Manpages indicates you may use the -c of ssh-add and this indeed adds one more level of safety to the agent mechanism (the agent will aks for the passphrase confirmation at each new session). However, if root has its mind on stealing a session, you are set for a lost battle from the start so do not feel over-confident of this option.

Addittional information

Help pages below links to the OpenSSH implementation of the ssh client/server and other ssh related documentation from our site.

 

SSH connection stability

IF
  • Your SSH connections are closed from home
  • You get disconnected from any nodes without any reasons?
  • ... and you are a PuTTY user
  • ... or an Uglix SSH client user
This page is for you. If you are another user, use different clients and so on, this page may still be informative and help you stabalize your connection (the same principles apply).

PuTTY users

PuTTY to connect to gateway (from a home connection), you have to

  • set a session, be sure to enable SSH

  • go to the 'Connection' menu and have the following options box checked

    • Disable Nagle's algorithm (TCP_NODELAY option)

    • Enable TCP keepalives (SO_KEEPALIVE option)

  • Furthermore, in 'Connection' -> 'SSH' -> 'Tunnels' enable the option

    • Enable X11 forwarding

    • Enable MIT-Magic-Cookie-1

  • Save the session

Documentation on those features (explanation for the interested) are added at the end of this document.


SSH Users

SSH users and owner of their system could first of all be sure to manipulate the SSH client configuration file and be sure settings are turned on by default. The client configuration is likely located as /etc/ssh_config or /usr/local/etc/ssh_config depending on where you have ssh installed.

But if you do NOT have access to the configuration file, the client can nonetheless pass on options from the command line. Those options would have the same name as they would appear in the config file.

Especially, KEEP_ALIVE is controlled via the SSH configuration option TCPKeepAlive.

% ssh -o TCPKeepAlive=yes

You will note in the next section that a spoofing issue exists with keep alive (I know it works well, but please consider the ServerAliveCountMax mechanism) so, you may use instead

% ssh -o TCPKeepAlive=no -o ServerAliveInterval=15

Note that the value 15 in our example is purely empirical. There are NO magic values and you need to test your connection and detect when (after what time) you get kicked out and disconnected and set the parameters from your client accordingly. Let's explain the default first and come back to this and a rule of thumb.

There are two relevant parameters (in addition of TCPKeepAlive):


ServerAliveInterval

Sets a timeout interval in seconds after which if no data has been received from the server, ssh will send a message through the encrypted channel to request a response from the server. The default is 0, indicating that these messages will not be sent to the server.

This option applies to protocol version 2 only.


ServerAliveCountMax

Sets the number of server alive messages (see above) which may be sent without ssh receiving any messages back from the server. If this threshold is reached while server alive messages are being sent, ssh will disconnect from the server, terminating the session. It is important to note that the use of server alive messages is very different from TCPKeepAlive (below). The server alive messages are sent through the encrypted channel and therefore will not be spoofable. The TCP keepalive option enabled by TCPKeepAlive is spoofable. The server alive mechanism is valuable when the client or server depend on knowing when a connection has become inactive.

The default value is 3. If, for example, ServerAliveInterval (above) is set to 15, and ServerAliveCountMax is left at the default, if the server becomes unresponsive ssh will disconnect after approximately 45 seconds.


In our example

% ssh -o TCPKeepAlive=no -o ServerAliveInterval=15

The recipe should be: if you get disconnected after N seconds, play with the above and be sure to set a

time of ServerAliveInterval*ServerAliveCountMax <= 0.8*N, N being the timeout. Since ServerAliveCountMax is typically not modified, in our example we assume the default value of 3 and therefore a a 3x15 = 45 seconds (and we guessed a disconnect every minute or so). If you set the value too low, the client will send to much "chatting" to the server and there will be a traffic impact.


Appendix

Nagle's algorithm

This was written based on this article.

RPC implementations on TCP should disable Nagle. This reduces average RPC request latency on TCP, and makes network trace tools work a little nicer.

Determines whether Nagle's algorithm is to be used. The Nagle's algorithm tries to conserve bandwidth by minimizing the number of segments that are sent. When applications wish to decrease network latency and increase performance, they can disable Nagle's algorithm (that is enable TCP_NODELAY). Data will be sent earlier, at the cost of an increase in bandwidth consumption.


KeepAlive

The KEEPALIVE option of the TCP/IP Protocol ensures that connections are kept alive even while they are idle. When a connection to a client is inactive for a period of time (the timeout period), the operating system sends KEEPALIVE packets at regular intervals. On most systems, the default timeout period is two hours (7,200,000 ms).

If the network hardware or software drops connections that have been idle for less than the two hour default, the Windows Client session will fail. KEEPALIVE timeouts are configured at the operating system level for all connections that have KEEPALIVE enabled.

If the network hardware or software (including firewalls) have a idle limit of one hour, then the KEEPALIVE timeout must be less than one hour. To rectify this situation TCP/IP KEEPALIVE settings can be lowered to fit inside the firewall limits. The implementation of TCP KEEPALIVE may vary from vendor to vendor. The original definition is quite old and described in RFC 1122.


MIT Magic cookie

To avoid unauthorized connections to your X display, the command xauth for encrypted X connections is widely used. When you login, a .Xauthority file is created in your home directory ($HOME). Even SSH initiate the creation of a magic cookie and without it, no display could be opened. Note that since the .Xauthority file IS the file containing the MIT Magic cookie, if you ever run out of disk quota or the file system is full, this file CANNOT be created or updated (even from the sshd impersonating the user) and consequently, no X connections can be opened.

The .Xauthority file sometimes contains information from older sessions, but this is not important, as a new key is created at every login session. The Xauthority is simple and powerful, and eliminates many of the security problems with X.




FileCatalog

The STAR FileCatalog is an a set of tools and API providing users access to the MeataData, File and Replica information pertaining to all data produced by the RHIC/STAR experiment.  The STAR FileCatalog in other words provides users access to meta-data, file and replica information through a unified schema-agnostic API. The user never needs to know the details of the relation between elements (or keywords) but rather, is provided with a flexible yet powerful query API allowing them to request any combination of 'keywords' based on sets of conditions composed of sequences of keyword operation values combinations. The user manual provides a list of keywords.

The STAR FIleCatalog also provides multi-site support through the same API. In other words, the same set of tools and programmatic interface allows to register, update, maintain a global catalog for the experiment and serve as a core component to the Data Management system. To date, the STAR FileCatalog holds information on 22 Million files and 52 Million active replicas.

 

The history & version information7

Manual

XML(s) examples

Examples and other documentation

 

A few examples will be left here to guide users and installer.

Data dictionary

This dictionary was created on 2012/03/12.

CollisionTypes

Field Type Null Default Comments
collisionTypeID smallint(6) No    
firstParticle varchar(10) No    
secondParticle varchar(10) No    
collisionEnergy float No 0  
collisionTypeIDate timestamp No CURRENT_TIMESTAMP  
collisionTypeCreator smallint(6) No 1  
collisionTypeCount int(11) Yes NULL  
collisionTypeComment text Yes NULL  

Creators

Field Type Null Default Comments
creatorID bigint(20) No    
creatorName varchar(15) Yes unknown  
creatorIDate timestamp No CURRENT_TIMESTAMP  
creatorCount int(11) Yes NULL  
creatorComment varchar(512) Yes NULL  

DetectorConfigurations

Field Type Null Default Comments
detectorConfigurationID int(11) No    
detectorConfigurationName varchar(50) Yes NULL  
dTPC tinyint(4) Yes NULL  
dSVT tinyint(4) Yes NULL  
dTOF tinyint(4) Yes NULL  
dEMC tinyint(4) Yes NULL  
dEEMC tinyint(4) Yes NULL  
dFPD tinyint(4) Yes NULL  
dFTPC tinyint(4) Yes NULL  
dPMD tinyint(4) Yes NULL  
dRICH tinyint(4) Yes NULL  
dSSD tinyint(4) Yes NULL  
dBBC tinyint(4) Yes NULL  
dBSMD tinyint(4) Yes NULL  
dESMD tinyint(4) Yes NULL  
dZDC tinyint(4) Yes NULL  
dCTB tinyint(4) Yes NULL  
dTPX tinyint(4) Yes NULL  
dFGT tinyint(4) Yes NULL  

DetectorStates

Field Type Null Default Comments
detectorStateID int(11) No    
sTPC tinyint(4) Yes NULL  
sSVT tinyint(4) Yes NULL  
sTOF tinyint(4) Yes NULL  
sEMC tinyint(4) Yes NULL  
sEEMC tinyint(4) Yes NULL  
sFPD tinyint(4) Yes NULL  
sFTPC tinyint(4) Yes NULL  
sPMD tinyint(4) Yes NULL  
sRICH tinyint(4) Yes NULL  
sSSD tinyint(4) Yes NULL  
sBBC tinyint(4) Yes NULL  
sBSMD tinyint(4) Yes NULL  
sESMD tinyint(4) Yes NULL  
sZDC tinyint(4) Yes NULL  
sCTB tinyint(4) Yes NULL  
sTPX tinyint(4) Yes NULL  
sFGT tinyint(4) Yes NULL  

EventGenerators

Field Type Null Default Comments
eventGeneratorID smallint(6) No    
eventGeneratorName varchar(30) No    
eventGeneratorVersion varchar(10) Yes 0  
eventGeneratorParams varchar(200) Yes NULL  
eventGeneratorIDate timestamp No CURRENT_TIMESTAMP  
eventGeneratorCreator smallint(6) No 1  
eventGeneratorCount int(11) Yes NULL  
eventGeneratorComment varchar(512) Yes NULL  

FileData

Field Type Null Default Comments
fileDataID bigint(20) No    
runParamID int(11) No 0  
fileName varchar(255) No    
baseName varchar(255) No   Name without extension
sName1 varchar(255) No   Will be used for name+runNumber
sName2 varchar(255) No   Will be used for name before runNumber
productionConditionID mediumint(9) Yes NULL  
numEntries mediumint(9) Yes 0  
md5sum varchar(32) Yes 0  
fileTypeID smallint(6) No 0  
fileSeq smallint(6) Yes NULL  
fileStream smallint(6) Yes 0  
fileDataIDate timestamp No CURRENT_TIMESTAMP  
fileDataCreator smallint(6) No 1  
fileDataCount int(11) Yes NULL  
fileDataComment text Yes NULL  

FileLocations

Field Type Null Default Comments
fileLocationID bigint(20) No    
fileDataID bigint(20) No 0  
filePathID bigint(20) No 0  
storageTypeID mediumint(9) No 0  
createTime timestamp No CURRENT_TIMESTAMP  
insertTime timestamp No 0000-00-00 00:00:00  
owner varchar(15) Yes NULL  
fsize bigint(20) Yes NULL  
storageSiteID smallint(6) No 0  
protection varchar(15) Yes NULL  
hostID mediumint(9) No 1  
availability tinyint(4) No 1  
persistent tinyint(4) No 0  
sanity tinyint(4) No 1  

FileLocationsID

Field Type Null Default Comments
fileLocationID bigint(20) No    

FileLocations_0

Field Type Null Default Comments
fileLocationID bigint(20) No    
fileDataID bigint(20) No 0  
filePathID bigint(20) No 0  
storageTypeID mediumint(9) No 0  
createTime timestamp No CURRENT_TIMESTAMP  
insertTime timestamp No 0000-00-00 00:00:00  
owner varchar(15) Yes NULL  
fsize bigint(20) Yes NULL  
storageSiteID smallint(6) No 0  
protection varchar(15) Yes NULL  
hostID mediumint(9) No 1  
availability tinyint(4) No 1  
persistent tinyint(4) No 0  
sanity tinyint(4) No 1  

FileLocations_1

Field Type Null Default Comments
fileLocationID bigint(20) No    
fileDataID bigint(20) No 0  
filePathID bigint(20) No 0  
storageTypeID mediumint(9) No 0  
createTime timestamp No CURRENT_TIMESTAMP  
insertTime timestamp No 0000-00-00 00:00:00  
owner varchar(15) Yes NULL  
fsize bigint(20) Yes NULL  
storageSiteID smallint(6) No 0  
protection varchar(15) Yes NULL  
hostID mediumint(9) No 1  
availability tinyint(4) No 1  
persistent tinyint(4) No 0  
sanity tinyint(4) No 1  

FileLocations_2

Field Type Null Default Comments
fileLocationID bigint(20) No    
fileDataID bigint(20) No 0  
filePathID bigint(20) No 0  
storageTypeID mediumint(9) No 0  
createTime timestamp No CURRENT_TIMESTAMP  
insertTime timestamp No 0000-00-00 00:00:00  
owner varchar(15) Yes NULL  
fsize bigint(20) Yes NULL  
storageSiteID smallint(6) No 0  
protection varchar(15) Yes NULL  
hostID mediumint(9) No 1  
availability tinyint(4) No 1  
persistent tinyint(4) No 0  
sanity tinyint(4) No 1  

FileLocations_3

Field Type Null Default Comments
fileLocationID bigint(20) No    
fileDataID bigint(20) No 0  
filePathID bigint(20) No 0  
storageTypeID mediumint(9) No 0  
createTime timestamp No CURRENT_TIMESTAMP  
insertTime timestamp No 0000-00-00 00:00:00  
owner varchar(15) Yes NULL  
fsize bigint(20) Yes NULL  
storageSiteID smallint(6) No 0  
protection varchar(15) Yes NULL  
hostID mediumint(9) No 1  
availability tinyint(4) No 1  
persistent tinyint(4) No 0  
sanity tinyint(4) No 1  

FileParents

Field Type Null Default Comments
parentFileID bigint(20) No 0  
childFileID bigint(20) No 0  

FilePaths

Field Type Null Default Comments
filePathID bigint(6) No    
filePathName varchar(255) No    
filePathIDate timestamp No CURRENT_TIMESTAMP  
filePathCreator smallint(6) No 1  
filePathCount int(11) Yes NULL  
filePathComment varchar(512) Yes NULL  

FileTypes

Field Type Null Default Comments
fileTypeID smallint(6) No    
fileTypeName varchar(30) No    
fileTypeExtension varchar(15) No    
fileTypeIDate timestamp No CURRENT_TIMESTAMP  
fileTypeCreator smallint(6) No 1  
fileTypeCount int(11) Yes NULL  
fileTypeComment varchar(512) Yes NULL  

Hosts

Field Type Null Default Comments
hostID smallint(6) No    
hostName varchar(30) No localhost  
hostIDate timestamp No CURRENT_TIMESTAMP  
hostCreator smallint(6) No 1  
hostCount int(11) Yes NULL  
hostComment varchar(512) Yes NULL  

ProductionConditions

Field Type Null Default Comments
productionConditionID smallint(6) No    
productionTag varchar(10) No    
libraryVersion varchar(10) No    
productionConditionIDate timestamp No CURRENT_TIMESTAMP  
productionConditionCreator smallint(6) No 1  
productionConditionCount int(11) Yes NULL  
productionConditionComment varchar(512) Yes NULL  

RunParams

Field Type Null Default Comments
runParamID int(11) No    
runNumber bigint(20) No 0  
dataTakingStart timestamp No 0000-00-00 00:00:00  
dataTakingEnd timestamp No 0000-00-00 00:00:00  
dataTakingDay smallint(6) Yes 0  
dataTakingYear smallint(6) Yes 0  
simulationParamsID int(11) Yes NULL  
runTypeID smallint(6) No 0  
triggerSetupID smallint(6) No 0  
detectorConfigurationID mediumint(9) No 0  
detectorStateID mediumint(9) No 0  
collisionTypeID smallint(6) No 0  
magFieldScale varchar(50) No    
magFieldValue float Yes NULL  
runParamIDate timestamp No CURRENT_TIMESTAMP  
runParamCreator smallint(6) No 1  
runParamCount int(11) Yes NULL  
runParamComment varchar(512) Yes NULL  

RunTypes

Field Type Null Default Comments
runTypeID smallint(6) No    
runTypeName varchar(255) No    
runTypeIDate timestamp No CURRENT_TIMESTAMP  
runTypeCreator smallint(6) No 1  
runTypeCount int(11) Yes NULL  
runTypeComment varchar(512) Yes NULL  

SimulationParams

Field Type Null Default Comments
simulationParamsID int(11) No    
eventGeneratorID smallint(6) No 0  
simulationParamIDate timestamp No CURRENT_TIMESTAMP  
simulationParamCreator smallint(6) No 1  
simulationParamCount int(11) Yes NULL  
simulationParamComment varchar(512) Yes NULL  

StorageSites

Field Type Null Default Comments
storageSiteID smallint(6) No    
storageSiteName varchar(30) No    
storageSiteLocation varchar(50) Yes NULL  
storageSiteIDate timestamp No CURRENT_TIMESTAMP  
storageSiteCreator smallint(6) No 1  
storageSiteCount int(11) Yes NULL  
storageSiteComment varchar(512) Yes NULL  

StorageTypes

Field Type Null Default Comments
storageTypeID mediumint(9) No    
storageTypeName varchar(6) No    
storageTypeIDate timestamp No CURRENT_TIMESTAMP  
storageTypeCreator smallint(6) No 1  
storageTypeCount int(11) Yes NULL  
storageTypeComment varchar(512) Yes NULL  

TriggerCompositions

Field Type Null Default Comments
fileDataID bigint(20) No 0  
triggerWordID mediumint(9) No 0  
triggerCount mediumint(9) Yes 0  

TriggerSetups

Field Type Null Default Comments
triggerSetupID smallint(6) No    
triggerSetupName varchar(50) No    
triggerSetupComposition varchar(255) No    
triggerSetupIDate timestamp No CURRENT_TIMESTAMP  
triggerSetupCreator smallint(6) No 1  
triggerSetupCount int(11) Yes NULL  
triggerSetupComment varchar(512) Yes NULL  

TriggerWords

Field Type Null Default Comments
triggerWordID mediumint(9) No    
triggerWordName varchar(50) No    
triggerWordVersion varchar(6) No V0.0  
triggerWordBits varchar(6) No    
triggerWordIDate timestamp No CURRENT_TIMESTAMP  
triggerWordCreator smallint(6) No 1  
triggerWordCount int(11) Yes NULL  
triggerWordComment varchar(512) Yes NULL  

Tables creation and attributes

#use FileCatalog;

#
# All IDs are named after their respective table. This MUST
# remain like this.
#  eventGeneratorID        -> eventGenerator+ID       in 'EventGenerators'
#  detectorConfigurationID ->detectorConfiguration+ID in 'DetectorConfigurations'
#
# etc...
#

DROP TABLE IF EXISTS EventGenerators;
CREATE TABLE EventGenerators
(
  eventGeneratorID      SMALLINT     NOT NULL    AUTO_INCREMENT,
  eventGeneratorName    VARCHAR(30)        NOT NULL,
  eventGeneratorVersion VARCHAR(10)     NOT NULL,
  eventGeneratorParams  VARCHAR(200),

  eventGeneratorIDate   TIMESTAMP       NOT NULL,
  eventGeneratorCreator CHAR(15)        DEFAULT 'unknown' NOT NULL,
  eventGeneratorCount   INT,
  eventGeneratorComment TEXT,
  UNIQUE        EG_EventGeneratorUnique (eventGeneratorName, eventGeneratorVersion, eventGeneratorParams),
  PRIMARY KEY (eventGeneratorID)
) TYPE=MyISAM;

DROP TABLE IF EXISTS DetectorConfigurations; CREATE TABLE DetectorConfigurations
(
  detectorConfigurationID               INT             NOT NULL        AUTO_INCREMENT,
  detectorConfigurationName             VARCHAR(50)        NULL           UNIQUE,
  dTPC                                  TINYINT,
  dSVT                                  TINYINT,
  dTOF                                  TINYINT,
  dEMC                                  TINYINT,
  dEEMC                                 TINYINT,
  dFPD                                  TINYINT,
  dFTPC                                 TINYINT,
  dPMD                                  TINYINT,
  dRICH                                 TINYINT,
  dSSD                                  TINYINT,
  dBBC                                  TINYINT,
  dBSMD                                 TINYINT,
  dESMD                                 TINYINT,
  PRIMARY KEY (detectorConfigurationID)
) TYPE=MyISAM;


# Trigger related tables
DROP TABLE IF EXISTS TriggerSetups; CREATE TABLE TriggerSetups
(
   triggerSetupID               SMALLINT     NOT NULL    AUTO_INCREMENT,
   triggerSetupName             VARCHAR(50)        NOT NULL       UNIQUE,
   triggerSetupComposition      VARCHAR(255) NOT NULL,

   triggerSetupIDate            TIMESTAMP       NOT NULL,
   triggerSetupCreator          CHAR(15)       DEFAULT 'unknown' NOT NULL,
   triggerSetupCount            INT,
   triggerSetupComment          TEXT,
   PRIMARY KEY                  (triggerSetupID)
) TYPE=MyISAM;


DROP TABLE IF EXISTS TriggerCompositions; CREATE TABLE TriggerCompositions
(
  fileDataID                    BIGINT          NOT NULL,
  triggerWordID                 INT             NOT NULL,
  triggerCount                  MEDIUMINT       DEFAULT 0,
  PRIMARY KEY                   (fileDataID, triggerWordID)
) TYPE=MyISAM;



DROP TABLE IF EXISTS TriggerWords;
CREATE TABLE TriggerWords (
  triggerWordID         mediumint(9)   NOT NULL auto_increment,
  triggerWordName       varchar(50)  NOT NULL default '',
  triggerWordVersion    varchar(6)        NOT NULL default 'V0.0',
  triggerWordBits       varchar(6)   NOT NULL default '',
  triggerWordIDate      timestamp(14)       NOT NULL,
  triggerWordCreator    varchar(15)       NOT NULL default 'unknown',
  triggerWordCount      int(11)     default NULL,
  triggerWordComment    text,
  PRIMARY KEY           (triggerWordID),
  UNIQUE KEY TW_TriggerCharacteristic (triggerWordName,triggerWordVersion,triggerWordBits)
) TYPE=MyISAM;




DROP TABLE IF EXISTS CollisionTypes; CREATE TABLE CollisionTypes
(
  collisionTypeID SMALLINT NOT NULL AUTO_INCREMENT,
  firstParticle VARCHAR(10) NOT NULL,
  secondParticle VARCHAR(10) NOT NULL,
  collisionEnergy FLOAT NOT NULL,
  PRIMARY KEY (collisionTypeID)
) TYPE=MyISAM;


#
# A few dictionary tables
#
DROP TABLE IF EXISTS ProductionConditions; CREATE TABLE ProductionConditions
(
  productionConditionID         SMALLINT       NOT NULL      AUTO_INCREMENT,
  productionTag                 VARCHAR(10)   NOT NULL,
  libraryVersion                VARCHAR(10)   NOT NULL,

  productionConditionIDate      TIMESTAMP       NOT NULL,
  productionConditionCreator    CHAR(15)        DEFAULT 'unknown' NOT NULL,
  productionConditionCount      INT,
  productionConditionComments   TEXT,
  PRIMARY KEY                   (productionConditionID)
) TYPE=MyISAM;

DROP TABLE IF EXISTS StorageSites; CREATE TABLE StorageSites
(
  storageSiteID                 SMALLINT      NOT NULL     AUTO_INCREMENT,
  storageSiteName               VARCHAR(30)  NOT NULL,
  storageSiteLocation           VARCHAR(50),

  storageSiteIDate              TIMESTAMP       NOT NULL,
  storageSiteCreator            CHAR(15)       DEFAULT 'unknown' NOT NULL,
  storageSiteCount              INT,
  storageSiteComment            TEXT,
  PRIMARY KEY                   (storageSiteID)
) TYPE=MyISAM;

DROP TABLE IF EXISTS FileTypes; CREATE TABLE FileTypes
(
  fileTypeID                    SMALLINT NOT NULL        AUTO_INCREMENT,
  fileTypeName                  VARCHAR(30)    NOT NULL   UNIQUE,
  fileTypeExtension             VARCHAR(15)        NOT NULL,

  fileTypeIDate                 TIMESTAMP       NOT NULL,
  fileTypeCreator               CHAR(15) DEFAULT 'unknown' NOT NULL,
  fileTypeCount                 INT,
  fileTypeComment               TEXT,
  PRIMARY KEY                   (fileTypeID)
) TYPE=MyISAM;

DROP TABLE IF EXISTS FilePaths; CREATE TABLE FilePaths
(
  filePathID                    BIGINT         NOT NULL         AUTO_INCREMENT,
  filePathName                  VARCHAR(255)   NOT NULL         UNIQUE,

  filePathIDate                 TIMESTAMP       NOT NULL,
  filePathCreator               CHAR(15) DEFAULT 'unknown' NOT NULL,
  filePathCount                 INT,
  filePathComment               TEXT,
  PRIMARY KEY                   (filePathID)
) TYPE=MyISAM;

DROP TABLE IF EXISTS Hosts; CREATE TABLE Hosts
(
  hostID                        SMALLINT       NOT NULL         AUTO_INCREMENT,
  hostName                      VARCHAR(30)    NOT NULL DEFAULT 'localhost' UNIQUE,

  hostIDate                     TIMESTAMP       NOT NULL,
  hostCreator                   CHAR(15)     DEFAULT 'unknown' NOT NULL,
  hostCount                     INT,
  hostComment                   TEXT,
  PRIMARY KEY                   (hostID)
) TYPE=MyISAM;


DROP TABLE IF EXISTS RunTypes; CREATE TABLE RunTypes
(
  runTypeID                     SMALLINT  NOT NULL AUTO_INCREMENT,
  runTypeName                   VARCHAR(255)    NOT NULL   UNIQUE,

  runTypeIDate                  TIMESTAMP       NOT NULL,
  runTypeCreator                CHAR(15)  DEFAULT 'unknown' NOT NULL,
  runTypeCount                  INT,
  runTypeComment                TEXT,
  PRIMARY KEY                   (runTypeID)
) TYPE=MyISAM;


DROP TABLE IF EXISTS StorageTypes; CREATE TABLE StorageTypes
(
  storageTypeID                 MEDIUMINT       NOT NULL    AUTO_INCREMENT,
  storageTypeName               VARCHAR(6)   NOT NULL  UNIQUE,

  storageTypeIDate              TIMESTAMP       NOT NULL,
  storageTypeCreator            CHAR(15)       DEFAULT 'unknown' NOT NULL,
  storageTypeCount              INT,
  storageTypeComment            TEXT,
  PRIMARY KEY                   (storageTypeID)
) TYPE=MyISAM;





DROP TABLE IF EXISTS SimulationParams; CREATE TABLE SimulationParams
(
  simulationParamsID            INT             NOT NULL     AUTO_INCREMENT,
  eventGeneratorID              SMALLINT    NOT NULL,
  detectorConfigurationID       INT             NOT NULL,
  simulationParamComments       TEXT,
  PRIMARY KEY                   (simulationParamsID),
  INDEX         SP_EventGeneratorIndex          (eventGeneratorID),
  INDEX         SP_DetectorConfigurationIndex   (detectorConfigurationID)
) TYPE=MyISAM;

DROP TABLE IF EXISTS RunParams;
CREATE TABLE RunParams
(
  runParamID                  INT        NOT NULL AUTO_INCREMENT,
  runNumber                   BIGINT     NOT NULL UNIQUE,
  dataTakingStart             TIMESTAMP,
  dataTakingEnd               TIMESTAMP,
  simulationParamsID          INT       NULL,
  runTypeID                   SMALLINT     NOT NULL,
  triggerSetupID              SMALLINT      NOT NULL,
  detectorConfigurationID     INT            NOT NULL,
  collisionTypeID             SMALLINT             NOT NULL,
  magFieldScale               VARCHAR(50)    NOT NULL,
  magFieldValue               FLOAT,
  runComments                 TEXT,
  PRIMARY KEY                          (runParamID),
  INDEX RP_RunNumberIndex              (runNumber),
  INDEX RP_DataTakingStartIndex        (dataTakingStart),
  INDEX RP_DataTakingEndIndex          (dataTakingEnd),
  INDEX RP_MagFieldScaleIndex          (magFieldScale),
  INDEX RP_MagFieldValueIndex          (magFieldValue),
  INDEX RP_SimulationParamsIndex       (simulationParamsID),
  INDEX RP_RunTypeIndex                (runTypeID),
  INDEX RP_TriggerSetupIndex           (triggerSetupID),
  INDEX RP_DetectorConfigurationIndex  (detectorConfigurationID),
  INDEX RP_CollisionTypeIndex          (collisionTypeID)
) TYPE=MyISAM;

DROP TABLE IF EXISTS FileData; CREATE TABLE FileData
(
  fileDataID                    BIGINT          NOT NULL AUTO_INCREMENT,
  runParamID                    INT             NOT NULL,
  fileName                      VARCHAR(255)       NOT NULL,
  baseName                      VARCHAR(255)       NOT NULL COMMENT 'Name without extension',
  sName1                        VARCHAR(255) NOT NULL COMMENT 'Will be used for name+runNumber',
  sName2                        VARCHAR(255) NOT NULL COMMENT 'Will be used for name before runNumber',
  productionConditionID         INT             NULL,
  numEntries                    MEDIUMINT,
  md5sum                        CHAR(32)     DEFAULT 0,
  fileTypeID                    SMALLINT NOT NULL,
  fileSeq                       SMALLINT,
  fileStream                    SMALLINT,
  fileDataComments              TEXT,
  PRIMARY KEY                   (fileDataID),
  INDEX         FD_FileNameIndex                (fileName(40)),
  INDEX         FD_BaseNameIndex                (baseName),
  INDEX         FD_SName1Index                  (sName1),
  INDEX         FS_SName2Index                  (sName2),
  INDEX         FD_RunParamsIndex               (runParamID),
  INDEX         FD_ProductionConditionIndex     (productionConditionID),
  INDEX         FD_FileTypeIndex                (fileTypeID),
  INDEX         FD_FileSeqIndex                 (fileSeq),
  UNIQUE        FD_FileDataUnique               (runParamID, fileName, productionConditionID, fileTypeID, fileSeq)
) TYPE=MyISAM;



# FileParents
DROP TABLE IF EXISTS FileParents; CREATE TABLE FileParents
(
  parentFileID                  BIGINT          NOT NULL,
  childFileID                   BIGINT          NOT NULL,
  PRIMARY KEY                   (parentFileID, childFileID)
) TYPE=MyISAM;

# FileLocations
DROP TABLE IF EXISTS FileLocations; CREATE TABLE FileLocations
(
  fileLocationID                BIGINT          NOT NULL      AUTO_INCREMENT,
  fileDataID                    BIGINT          NOT NULL,
  filePathID                    BIGINT          NOT NULL,
  storageTypeID                 MEDIUMINT       NOT NULL,
  createTime                    TIMESTAMP,
  insertTime                    TIMESTAMP       NOT NULL,
  owner                         VARCHAR(30),
  fsize                         BIGINT,
  storageSiteID                 SMALLINT      NOT NULL,
  protection                    VARCHAR(15),
  hostID                        BIGINT          NOT NULL DEFAULT 1,
  availability                  TINYINT         NOT NULL DEFAULT 1,
  persistent                    TINYINT         NOT NULL DEFAULT 0,
  sanity                        TINYINT         NOT NULL DEFAULT 1,
  PRIMARY KEY                   (fileLocationID),
  INDEX         FL_FilePathIndex                (filePathID),
  INDEX         FL_FileDataIndex                (fileDataID),
  INDEX         FL_StorageTypeIndex             (storageTypeID),
  INDEX         FL_StorageSiteIndex             (storageSiteID),
  INDEX         FL_HostIndex                    (hostID),
  UNIQUE        FL_FileLocationUnique           (fileDataID, storageTypeID, filePathID, storageSiteID, hostID)
) TYPE=MyISAM;

XML configuration

 

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE SCATALOG [
   <!ELEMENT SCATALOG (SITE*)>
       <!ATTLIST SCATALOG VERSION CDATA #REQUIRED>
   <!ELEMENT SITE (SERVER+)>
       <!ATTLIST SITE name (BNL | LBL) #REQUIRED>
       <!ATTLIST SITE description CDATA #IMPLIED>
       <!ATTLIST SITE URI CDATA #IMPLIED>
   <!ELEMENT SERVER (HOST+)>
       <!ATTLIST SERVER SCOPE (Master | Admin | User) #REQUIRED>
   <!ELEMENT HOST (ACCESS+)>
       <!ATTLIST HOST NAME CDATA #REQUIRED>
       <!ATTLIST HOST DBTYPE CDATA #IMPLIED>
       <!ATTLIST HOST DBNAME CDATA #REQUIRED>
       <!ATTLIST HOST PORT CDATA #IMPLIED>
   <!ELEMENT ACCESS EMPTY>
       <!ATTLIST ACCESS USER CDATA #IMPLIED>
       <!ATTLIST ACCESS PASS CDATA #IMPLIED>
]>



<SCATALOG VERSION="1.0.1">
        <SITE name="BNL">
                <SERVER SCOPE="Master">
                        <HOST NAME="mafata.wherever.net" DBNAME="Catalog_XXX" PORT="1234">
                                <ACCESS USER="Moi" PASS="HelloWorld"/>
                        </HOST>
                        <HOST NAME="mafata.wherever.net" DBNAME="Catalog_YYY" PORT="1235">
                                <ACCESS USER="Moi" PASS="HelloWorld"/>
                        </HOST>
                        <HOST NAME="duvall.star.bnl.gov" DBNAME="FileCatalog" PORT="">
                                <ACCESS USER="FC_master" PASS="AllAccess"/>
                        </HOST>
                </SERVER>
                <SERVER SCOPE="Admin">
                        <HOST NAME="duvall.star.bnl.gov" DBNAME="FileCatalog_BNL" PORT="">
                                <ACCESS USER="FC_admin" PASS="ExamplePassword"/>
                        </HOST>
                </SERVER>
                <SERVER SCOPE="User">
                        <HOST NAME="duvall.star.bnl.gov" DBNAME="FileCatalog_BNL" PORT="">
                                <ACCESS USER="FC_user" PASS="FCatalog"/>
                        </HOST>
                </SERVER>
        </SITE>
</SCATALOG>

Migration and notes from V01.265 to V01.275

This document is intended for FileCatalog managers only who have previously deployed an earlier version of API and older database table layout. It is NOT intended for users.

Reasoning for this upgrade and core of the upgrade

One of the major problem with the preceding database layout started to show itself when we reached 4 Million entries (for some reason, we seem to have magic numbers). A dire restriction was the presence of the field 'path' and 'nodename' in the FileLocations table. This table became unnecessarily large (of the order of GB) and sorting and queries would become slow and IO demanding (regardless of our careful indexing). The main action was to move both field to separate tables. This change requires a two step modification :

  1. reshape of the database (leaving the old field), deployment of the database API in cross mode support
  2. run the normalization scripts filling the new table and fields, deployment of the final API and drop of the obsolete columns (+ index rebuild)

The steps are more carefully described below ...

Step by step migration instructions

Has to be made in several steps for safety a least interruption of service (although a pain to the manager). Note that you can do that much faster by cutting the Master/slave relationship, disabling all daemons auto-updating the database, proceed with table reshape and normalization script execution, drop and rebuild index, deploy the point-of-no-return API and restore Master/slave relation).

This upgrade is best if you have perl 5.8 or upper. Note that this transition will be the LAST one using perl 5.6 (get ready for a perl upgrade on your cluster).

We will assume you know how to connect to your database from an account able to manipulate and create any tables in the FileCatalog database.

Steps in Phase I

  1. (0) Create the following tables
      DROP TABLE IF EXISTS FilePaths; CREATE TABLE FilePaths
      (
        filePathID                    BIGINT         NOT NULL         AUTO_INCREMENT,
        filePathName                  VARCHAR(255)   NOT NULL         UNIQUE,
        filePathCount                 INT,
        PRIMARY KEY                   (filePathID)
      ) TYPE=MyISAM;
    
      DROP TABLE IF EXISTS Hosts; CREATE TABLE Hosts 
     (
        hostID      smallint(6) NOT NULL auto_increment,
        hostName    varchar(30) NOT NULL default 'localhost',
        hostIDate   timestamp(14) NOT NULL,
        hostCreator varchar(15) NOT NULL default 'unknown',
        hostCount   int(11) default NULL,
        hostComment text,
        PRIMARY KEY (hostID),
        UNIQUE KEY  hostName (hostName)
      ) TYPE=MyISAM;
    
    
  2. Modify some table and recreate one
         
         ALTER TABLE `FileLocations` ADD `filePathID` bigint(20) NOT NULL default '0' AFTER `fileDataID`;
         ALTER TABLE `FileLocations` ADD `hostID` bigint(20) NOT NULL default '1' AFTER `protection`;
         UPDATE TABLE `FileLocations` SET hostID=0;
    
         # note that I did that one from the Web interface (TBC)
         INSERT INTO Hosts VALUES(0,'localhost',NOW()+0,'',0,'Any unspecified node'); 
    
         ALTER TABLE `FileLocations` ADD INDEX ( `filePathID` )  
    
         ALTER TABLE `FilePaths` ADD `filePathIDate` TIMESTAMP NOT NULL AFTER `filePathName` ;
         ALTER TABLE `FilePaths` ADD `filePathCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `filePathIDate` ;
         ALTER TABLE `FilePaths` ADD `filePathComment` TEXT AFTER `filePathCount`;
    
         ALTER TABLE `StorageSites` ADD  `storageSiteIDate` TIMESTAMP NOT NULL AFTER `storageSiteLocation` ;
         ALTER TABLE `StorageSites` ADD  `storageSiteCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageSiteIDate` ;
         ALTER TABLE `StorageSites` DROP `storageComment`;
         ALTER TABLE `StorageSites` ADD  `storageSiteComment` TEXT AFTER `storageSiteCount`;
    
         ALTER TABLE `StorageTypes` ADD `storageTypeIDate` TIMESTAMP NOT NULL AFTER `storageTypeName` ;
         ALTER TABLE `StorageTypes` ADD `storageTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageTypeIDate` ;
    
    
         ALTER TABLE `FileTypes` ADD `fileTypeIDate` TIMESTAMP NOT NULL AFTER `fileTypeExtension` ;
         ALTER TABLE `FileTypes` ADD `fileTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `fileTypeIDate` ;
         ALTER TABLE `FileTypes` ADD `fileTypeComment` TEXT AFTER `fileTypeCount`;
    
    
         ALTER TABLE `TriggerSetups` ADD `triggerSetupIDate` TIMESTAMP NOT NULL AFTER `triggerSetupComposition` ;
         ALTER TABLE `TriggerSetups` ADD `triggerSetupCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `triggerSetupIDate`;
         ALTER TABLE `TriggerSetups` ADD `triggerSetupCount`   INT AFTER `triggerSetupCreator`;
         ALTER TABLE `TriggerSetups` ADD `triggerSetupComment` TEXT  AFTER `triggerSetupCount`;
    
         ALTER TABLE `EventGenerators` ADD `eventGeneratorIDate` TIMESTAMP NOT NULL AFTER `eventGeneratorParams` ;
         ALTER TABLE `EventGenerators` ADD `eventGeneratorCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `eventGeneratorIDate` ;
         ALTER TABLE `EventGenerators` ADD `eventGeneratorCount`   INT AFTER `eventGeneratorCreator`;
    
         ALTER TABLE `RunTypes` ADD `runTypeIDate` TIMESTAMP NOT NULL AFTER `runTypeName` ;
         ALTER TABLE `RunTypes` ADD `runTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `runTypeIDate` ;
    
         ALTER TABLE `ProductionConditions` DROP `productionComments`; 
         ALTER TABLE `ProductionConditions` ADD  `productionConditionIDate`   TIMESTAMP NOT NULL AFTER `libraryVersion`;
         ALTER TABLE `ProductionConditions` ADD  `productionConditionCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `productionConditionIDate`;
         ALTER TABLE `ProductionConditions` ADD  `productionConditionComment` TEXT AFTER `productionConditionCount`;
    
    
    
         #
         # This table was not shaped as a dictionary so needs to be re-created
         # Hopefully, was not filled prior (but will be this year)
         #
         DROP TABLE IF EXISTS TriggerWords; CREATE TABLE TriggerWords
         (
            triggerWordID           MEDIUMINT       NOT NULL        AUTO_INCREMENT,
            triggerWordName         VARCHAR(50)     NOT NULL,
            triggerWordVersion      CHAR(6)         NOT NULL DEFAULT "V0.0",
            triggerWordBits         CHAR(6)         NOT NULL,  
            triggerWordIDate        TIMESTAMP       NOT NULL,
            triggerWordCreator      CHAR(15)        DEFAULT 'unknown' NOT NULL,
            triggerWordCount        INT,
            triggerWordComment      TEXT,
            UNIQUE   TW_TriggerCharacteristic (triggerWordName, triggerWordVersion, triggerWordBits),
            PRIMARY KEY             (triggerWordID)
         ) TYPE=MyISAM;
  3. Deploy the new API CVS version 1.62 of FileCatalog.pm

  4. Run the following utility scripts

    util/path_convert.pl
    util/host_convert.pl

    Note that those scripts use a new method $fC->connect_as("Admin"); which assumes that the Master Catalog will be accessed using the XML connection description. Also, it should be obvious that

    use lib "/WhereverYourModulAPIisInstalled"; should be replaced by the appropriate path for your site (or test area). Finally, it uses API CVS version 1.62 which supports Xpath and Xnode transitional keywords allowing us to transfer the information from one field to one table.

  5. Check that Hosts table was filled properly and automatically with Creator/IDate
  6. Paranoia step : Re-run the scripts mentioned 2 steps ago

    At this stage and ideally, nothing should happen (as you have already modified the records).
    A few tips prior from doing that
    • % fC_cleanup.pl -modif node=localhost -cond node='' -doit
      would hopefully do nothing but if you have messed something up in the past, hostName would be NULL and the above would be necessary.
    • After a full update, the following queries should return NOTHING
      % get_file_list.pl -keys flid -cond rfpid=0 -all -alls -as Admin
      % get_file_list.pl -keys flid -cond rhid=0 -all -alls -as Admin

      Those are equivalent to the SQL statements
      >SELECT FileLocations.fileLocationID FROM FileLocations WHERE FileLocations.filePathID = 0 LIMIT 0, 100
      >SELECT FileLocations.fileLocationID FROM FileLocations WHERE FileLocations.hostID = 0 LIMIT 0, 100 
    If it does return anything, contact me for further investigation and database repairs. As a side note, the -as keyword was introduced recently and you should update your get_file_list.pl script if not available.
  7. Make a backup copy of the database for security (optional but safer) Backup can be done by easer a dump of mysql or more trivially, a cp -r of the database directory.
  8. Leave it running for a few days (should be fine) for confidence consolidation ;-)

You are ready for phase II. Hang on tight now ...

Steps in Phase II

Those steps are no VERY intrusive and potentially destructive. Be careful from here on ...

  1. Stop all daemons, be sure that during the rest of the operations, NO command attempts to manipulate the database. If you want to shield your users from the upgrade, stop all Master/slave relations.
  2. Connect to the master FileCatalog as administrator for that database and execute the following SQL commands
      > ALTER TABLE `FileLocations` ADD INDEX FL_HostIndex (hostID);
      > ALTER TABLE `FileLocations` DROP INDEX `FL_FileLocationUnique`, ADD UNIQUE (fileDataID, storageTypeID, filePathID, storageSiteID, hostID);
    
      # drop the columns not in use anymore / should also get rid of the associated
      # indexes.
      > ALTER TABLE `FileLocations` DROP COLUMN nodeName;
      > ALTER TABLE `FileLocations` DROP COLUMN filePath;
    
      # "rename" index / was created with a name difference to avoid clash for transition
      # now renamed for consistency
      > ALTER TABLE `FileLocations` DROP INDEX `filePathID`, ADD INDEX  FL_FilePathIndex (filePathID);
  3. OK, you should be done. Deploy either CVS version 1.63 which correspond to the FileCatalog API version V01.275 and above ... (by the way, get_file_list.pl -V gives the API version).


 

A few notes

  • The new API is XML connection aware via a non-mandatory module named XML::Simple . You should install that module but there are some limitations if you are using perl 5.6 i.e., you MUST use the schema with ONLY one choice per category (Admin, Master or User).
  • Your scripts will likely need to change if your Database Master and Slave are not on the same node (i.e. the administration account for the FileCatalog can be used only on the database Master and the regular user account on the Slave). There are a few forms of this such as the one below
    # Get connection fills the blanks while reading from XML
    # However, USER/PASSWORD presence are re-checked
    #$fC->debug_on();
    ($USER,$PASSWD,$PORT,$HOST,$DB) = $fC->get_connection("Admin");
    $port = $PORT if ( defined($PORT) );
    $host = $HOST if ( defined($HOST) );
    $db   = $DB   if ( defined($DB) );
    
    
    if ( defined($USER) ){   $user = $USER;}
    else {                   $user = "FC_admin";}
    
    if ( defined($PASSWD) ){ $passwd = $PASSWD;}
    else {                   print "Password for $user : ";
                             chomp($passwd = );}
    
    #
    # Now connect using a fully specified user/passwd/port/host/db
    #
    $fC->connect($user,$passwd,$port,$host,$db);

    or counting on the full definition in the XML file

    $fC    = FileCatalog->new();
    $fC->connect_as("Admin");
  • Note a small future convenience when XML is ON. connect_as() does not only select as who you want to connect to but where as well. In fact, the proper syntax is intent=SITE::User (for example BNL::Admin is valid as well as LBL::User). This is only partly supported however.
  • The new version of the API automatically add information in dictionary tables. Especially, the account under which a new dictionary value was inserted (Creator) and the insertion date (IDate) are filled automatically. A side effect being that the new API is NOT compatible with previous database table layout (no backward support will be attempted).

Migration and notes from V01.275 to V01.280

This document is intended for FileCatalog managers only who have previously deployed an earlier version of API and older database table layout. It is NOT intended for users.

Reasoning for this upgrade and core of the upgrade

This upgrade is a minor one, making support for two more detector sub-systems. The new API supports this modification. You need to alter the table DetectorConfigurations and add two columns. API are always forward compatible in that regard so it is completely safe to alter the table and deploy the API later.

ALTER TABLE `DetectorConfigurations` ADD dBSMD TINYINT;
ALTER TABLE `DetectorConfigurations` ADD dESMD TINYINT;
UPDATE `DetectorConfigurations` SET dBSMD=0;
UPDATE `DetectorConfigurations` SET dESMD=0;

And deploy the API V01.280 or later. You are done.

User manual

The command line interface to the FileCatalog

The FileCatalog database contains information about all files used in production for the STAR experiment. The database itself and the PERL module used to manipulate it are described later in this document. Below You'll find the description of the command line utility to access data in this database.
The utility is called get_file_list.pl. If issued without any arguments it will print the following usage message:

% get_file_list.pl [-all] -keys keyword[,keyword,...]              \    
  [-cond keyword=value[,keyword=value,...]] [-start #] [-limit #] [-delim $St] \    
  [-onefile] [-o outputfile]

Command line options

The command line options are described below:

-all use all entries regardless of availability flag. Default is to show only available=1
-alls use all entries regardless of sanity flag, default is to show sanity=1 unless the sanity flag was used as a key
-onefile A special mode of operation; returns a list of files, but gives only one location (the one with highest persistence) for each file, even if the database has many.
-keys Specify what data You want to get from the database. A list of valid keywords, separated by colons should follow this parameter. See examples for more clarification. See also the description of aggregate functions for some more sophisticated tricks.
-cond Specify the conditions limiting returned dataset. A list of valid expressions (consisting of a valid keyword, a valid operator and a value), separated by colons should follow this parameter. Since some of the operators are special characters, the list of expressions should always be enclosed in single quotes. Again, see examples for more explanations.
-start # specify the record number # to start from - default - start with the first record (together with -limit can be used to get the data in chunks)
-limit # limit the number of records returned (default 100, a value of 0 indicates an unlimited number of records).
-rlimit # limit the number of unique LFN (attention, the number of lines may be more than the rlimit). Using rlimit will switch the limit logic off and you cannot use both at the same time.
-delim <string>
specify the characters that will separate the fields in the output (default: “::“)
-V print the module version and leave
-as <scope>
-as <site:scope>
connects to the FileCatalog database as specified. scopes are {Admin|User}. site should be specified for a multi-site deployment.


 

Supported comparison or selection operators

<= Not greater than  
< Lesser than  
>= Not less than  
> Greater than  
<> Not equal to  
!= Not equal to  
= equal to  
!~ Not containing (i.e. do not match) strings
~ Containing (i.e. approximately matching) strings
[] In range  
][ Outside the range  
% Modulo integer
%% Not Modulo integer

 

Logical operators

The following logical operators can also be used in a query. The usage scope in this case is in a -cond context as keyword=Value1 LogicalOperator Value2 {LogicalOperator Value3 ...}

|| Logical OR Strings or numbers
&& Logical AND Strings or numbers


Note that the use of the logical AND operator will return no selection in most cases (for example, a runnumber cannot be of Value1 and Value2 at the same time) but was added for later extension of the database : selections based on meta data such as triggered-events (many triggers in a file) would be case where this operator would be used.

The aggregate functions

These are special aggregate functions. They can be used in conjunction with any keyword that describes some data. Note that most of them only make sense for numerical values. See examples for the description on how to use them.

sum

The sum of the values

avg

The average of the values

min

The minimum of the values

max

The maximum of the values

orda

Sort the output in ascending order by this keyword

ordd

Sort the output in descending order by this keyword

count

The count for a given selection

grp

Group the output - put all the records with the same value for a given keyword together. This is required in conjunction with any aggregate functions used in a multiple keyword syntax context.


 

Keyword list

Here are the keywords that can be used in the context. There is a color scheme to those keywords as follow
Keywords in blue are currently supported by the database schema but unused by the production scripts and therefore are not filled (or ill-filled).
Keywords in aqua are automatically updated (there is no need to reset)
Keywords in magenta are filled but update may be needed (do not strongly rely on their value)
 

keyword

Notes

Meaning

site

 

The site where the data is stored, eg. BNL, LBL

sitecmt

 

The site comment string

siteloc

 

A full string describing the site location in the world

storage

 

The storage medium, eg. HPSS, NFS, local disk. Note that the local disk storage does not allow for a unique file location. One must also select on node

node

 

The name of the node where data is stored (necessary to locate local disk storage)

path

 

the path to a specific copy of the file

filename

 

The name of the data file

sname1

 

The (short) name of the data file with the extensions removed. E.G. "st_physics_12114010_raw_4040002"

sname2

 

The (short) name of the data file with only the file name prefix remaining. E.G. "st_physics". Useful, for example, to isolate only st_physics files and rejecting "st_physics_adc" files.

filetype

 

The type of the file, e.g. "daq_reco_dst", "MC_fzd" etc ...

extension

 

The extension of the file - directly connected to type (each file type has an associated extension)

events

 

Number of events or entries in the file

size

 

The size of the data file

fileseq

 

The file sequence as determined during data taking by DAQ. Arbitrary for simulation and processed files.

stream

 

The file stream if applicable (defaut is 0)

md5sum

Early stage db fill did not update this field. It may return 0.

The file's md5 checksum

production

 

The production tag with which a given file was produced. Can also be "raw" or "simulation"

library

 

The library version this file was produced with

trgsetupname

Used in to encode the path in production

The name of the online trigger setup name

trgname

 

The name of one trigger in a collection of triggers associated to a runumber.

trgcount

 

The event count having the associated trgname for a given runnumber

trgword

This is available for Year4 data and beyond for DAQ files

The trigger word associated to one trigger in a collection

trgversion

 

The trigger word version associated to a trgname

trgdefinition

 

The trigger definition of one trigger in a collection

runtype

 

the type of the run - eg. "physics", "laser" , “pulser”, “pedestal”, “test” but also "simulation" for simulated datasets

configuration

 

The detector configuration name. A detector configuration is a combination of detectors that were present during data taking in a given run. Note tha the combination configuration/ geometry is unique (but not any of the two alone)

geometry

 

The geometry definition for a given simulation set.

runnumber

 

The number of the run. Arbitrary for simulations.

runcomments

 

The comments for a given run.

collision

 

The collision type. Specified in the form of <first particle><second particle><collision energy>, eg. "AuAu200"

datetaken

Format was messed up at conversion old->new Catalog. Can be (and will be) recovered.

The date the data was taken. Arbitrary for simulation.

magscale

 

The name of the magnetic field scale, e.g. FullField

magvalue

 

The actual magnetic field value

filecomment

 

The comment to the file.

owner

 

The owner of the file.

protection

Subject to changes

The protection or read/write permissions, given in a format similar to UNIX 'ls -l'

available

 

is the file available ? (0 if one cannot get it from HPSS or the file disappeared from disk)

persistent

 

is the file persistent ?

createtime

Only HPSS files have a createtime which is not subject to changes

the time a file was created. Format is YYYYmmddHHMMSS

inserttime

 

the time a file data was inserted into the database.

simcomment

 

The comments for the simulation

generator

 

The event generator name

genversion

 

Event generator version

gencomment

 

Event generator comments

genparams

 

Event generator params

tpc

 

was the TPC in the data stream when specific data was taken?

svt

 

was the SVT in the data stream when specific data was taken?

tof

 

was the TOF in the data stream when specific data was taken?

emc

 

was the B-EMC in the data stream when specific data was taken?

eemc

 

was the E-EMC in the data stream when specific data was taken?

fpd

 

was the FPD in the data stream when specific data was taken?

ftpc

 

was the FTPC in the data stream when specific data was taken?

pmd

 

was the PMD in the data stream when specific data was taken?

rich

 

was the RICH in the data stream when specific data was taken?

ssd

 

was the SSD in the data stream when specific data was taken?

bbc

 

was the BBC in the data stream when specific data was taken?

bsmd

 

was the Barrel EMC SMD in the data stream when specific data was taken?

esmd

 

was the End-Cap SMD in the data stream when specific data was taken?

zdc  

was the Zero-Degree Calorimeter in the data stream when specific data was taken?

tpx   was the tpx (tpc-X) information in the data stream when data was taken?
fgt   was the Forward Gem Tracker information saved in this data stream?


 

The following keywords are for either internal use or specific management purposes. They have no meaning to users (but are unique).

flid

Access the FileLocation ID of the FileLocation table

fdid

Access the FileData ID of the FileData table

rfdid

Access the FileData ID of the FileLocation table

pcid

Access the ProductionCondition ID of the ProductionConditions table

rpcid

Access the ProductionCondition ID of the FileData table

rpid

Access the runParam ID of the runParams table

rrpid

Access the runParam ID of the FileData table

ftid

Access the FileType ID of the FileTypes table

rftid

Access the FileType ID of the FileData table

stid

Access the storageType ID of the StorageTypes table

rtid

Access the storageType ID of the FileLocations table

ssid

Access the storageSite ID of the StorageSites table

rssid

Access the storageSite ID of the FileLocations table

tcfdid

Access the FileData ID of the TriggerCompositions table

tctwid

Access the TriggerWords ID of the TriggerCompositions table

twid

Access the TriggerWords ID of the TriggerWords table

dcid

Access the detectorConfiguration ID of the DetectorConfigurations table

rdcid

Access the detectorConfiguration ID o the RunParams table

 

 

lgnm

An aggregate keyword returning an equivalence to the logical name

lgpth

An aggregate keyword returning a logical path (a string which uniquely characterize the file's location)

fulld

An aggregate keyword returning a string completely defining all meta-data for real data

fulls

An aggregate keyword returning a string completely defining all meta-data for simulation data

 

Here are the keywords not connected to a specific field in the database. They change the behaviour of the module itself.

keyword

Notes

Meaning

simulation

 

Is the data a simulation?

nounique

In script mode, this keyword is set to 0 (i.e. unique fields) which may slow down tremendously your scripting. In the user interface get_file_list.pl however, this is set by default to 1 (does not ensure unique fields).

Should the module return all fields, instead of only unique selected fields.

noround

 

Turns off rounding of magfield, and collision energy.

startrecord

 

The PERL module will skip the first startrecord records and start returning data beginning from the next one.

limit

 

The PERL module will return the maximum of limit records.

 

HPSS services

HPSS Performance study

Introduction

HPSS is software that manages petabytes of data on disk and robotic tape libraries. We will discuss in this section our observations as per the efficiency of accessing files in STAR as per a snapshot of the 2006 situation. It is clear that IO opptimizations has several components amongst which:
  • Access pattern optimization (request ordering, ...)
  • Optimization based on tape drive and technology capabilities
  • Miscellaneous technology considerations
    (cards, interface, firmware and driver, RAID, disks, ...)
  • HPSS disk cache optimizations
  • COS and/or PVR optimization
However, several trend have already been the object of past research and we will point to some of those and compare to our situation rather than debating the obvious. Wewill try to keep a focus on measurements in our environment.

Tape drive and technology capabilities

A starting point would be to discuss the capabilities of the technologies involved and their maximum performance and limitations. In STAR, two technologies remain as per 1006/10:
  • the 994B drives
  • the LTO-3 drives

Access pattern optimization (request ordering, ...)

A first simple and immediate consideration is to minimize tape mount and dismount operations, causing latencies and therefore performance drops. Since we use the DataCarousel for most restore operations, let's summarize its features.

The DataCarousel

The DataCarousel (DC) is an HPSS front end which main purpose is to coordinate requests from many un-correlated client's requests. Its main assumption is that all requests are asynchronous that is, you make a request from one client and it is satisfied “later” (as soon as possible). In other words, the DC aggregates all requests from all clients (many users could be considered as separate clients) and re-order them according policies, and possibly aggregating multiple requests for the same source into one request to the mass storage. The DC system itself is composed of a light client program (script), a plug-and-play policy based server architecture component (script) and a permanent process (compiled code) interfacing with the mass storage using HPSS API calls (this components is known as the “Oakridge Batch” although it current code content has little to do with the original idea from the Oakridge National Laboratory). Client and server interacts via a database component isolating client and server completely from each other (but sharing the same API , a perl module).

Policies may throttle the amount of data by group (quota, bandwidth percentage per user, etc ... i.e. queue request fairshare) but also perform tape access optimization such as grouping requests by tape ID (for equivalent share, all requests from the same tape are grouped together regardless of the time at which this request was performed or position in the request queue). The policy could be anything one can come up with based on the information either historical usage or current pending requests and characteristics of those requests (this could include file type, user, class of service, file size, ...). The DC then submits bundle of requests to the daemon component ; each request is a bundle of N file and known as a “job”. The DC submits K of those jobs before stopping and observing the mass storage behavior: if the jobs go through, more are submitted otherwise, either the server stops or proceed with a recovery procedure and consistency checks (as it will assume that no reaction and no unit of work being performed is a sign of MSS failure). In other words, the DC will also be error resilient and recover from intrinsic HPSS failures (being monitored). Whenever the files are moved from tape to cache in the MSS, a call back to the DC server is made and captive account connection is initiated to pull the file out of mass storage cache to more permanent storage.

Optimizations

While the policy is clearly a source of optimization (as far as the user is concerned), from a DataCarousel “post policy” perspective, N*K files are being requested at minimum at every point in time. In reality, more jobs are being submitted so the consumption of the “overflow”of job are used to monitor if the MSS is alive. The N*K files represents a total amount of files which should match the number of threads allowed by the daemon. The current setting are K=50, N=15 with an overflow allowed up to 25). The daemon itself has the possibility to treat requests simultaneously according to a“depth”. Those calls to HPSS are however only advisory. The depth is set at being 30 deep for the DST COS and 20 deep for the Raw COS. The deepest the request queue will be, more files will be requested simultaneously but this means that the daemon will also have to start more threads as previously noted. Those parameters have been showed to influence the performance to some extent (within 10%) with however a large impact on response time: the larger the request stack, the “less instantaneous” the response from a user's perspective (since the request queue length is longer).

The daemon has the ability of organizing X requests into a sorted list of tape ID and number of requests per tape. There are a few strategies allowing to alter the performance. We chose to enable “start with the tape with the largest number of files requested”. In addition, and since our queue depth is rather small comparing to the ideal number of files (K) per job, we order the files requested by the user by tape ID. Both optimizations are in place and lead to a 20% improvement within a realistic usage (bulk restore, Xrootd, other user activities).

Remaining optimizations

  • Optimization based on tapeID would need to be better quantified (graph, average restore rate) for several class of files and usage. TBD.

  • The tape ID program is a first implementation returning partial information. Especially, the MSS failures are not currently handled, leading to setting the tape ID to -1 (since there are now ways to recognize whether or not it is an error or a file missing in HPSS or even a file in the MSS MetaData server but located on a bad tape). Work in progress.

  • The queue depth parameters should be studied and adjusted according to the K and N values. However, this would need to respect the machine / hardware capabilities. The beefier the machine would be, the better but this is likely a fine tuning. This needs to be done with great care as the hardware is also shared by multiple experiments. Ideally the compiled daemon should auto-adjust to the DC API settings (and respect command line parameters for queue depth). TBD.

  • Currently, the daemon number of threads used for handling the HPSS API calls and to handle the call backs are sharing the same pool. This diminishes the number of threads available to communication with the Mass Storage and therefore, causes performance fluctuations (call back threads could get “stuck” or come in “waves” - we observed cosine behavior perhaps related to this issue). TBD.


Optimizations based on drive and technology capabilities

File size effects on IO performance

In this paper (CERN/IT 2005), the author measured the IO performance as a function of file size and number of files requested per tape. The figure of relevance is added here for illustration.
HPSS IO efficiency per file size and per
This graph has been extracted for an optimal 30 MB/s capable drive (9940B like). Both file size and number of files per cartriges have been evaluated. The conclusions are immediate and confirms the advertized behavior observed by all HPSS deployment (see references below). Smal file size is detrimental to HPSS IO performance and this size highly depends on the tape technology.

In STAR, we use the 9940B (read only as per 2006) and LT0-3 drives (read and write /all new files would go to LTO-3). The finding would not be altered but we have little marging of flexibility as per the "old" tape drive.

Below, we show the average file size per file type in STAR as a 2006 snapthot.

Average (bytes) Average (MB) File Type
943240627 899 MC_fzd
666227650 635 MC_reco_geant
561162588 535 emb_reco_event
487783881 465 online_daq
334945320 319 daq_reco_laser
326388157 311 MC_reco_dst
310350118 295 emb_reco_dst
298583617 284 daq_reco_event
246230932 234 daq_reco_dst
241519002 230 MC_reco_event
162678332 155 MC_reco_root_save
93111610 88 daq_reco_MuDst
52090140 49 MC_reco_MuDst
17495114 16 MC_reco_minimc
14982825 14 daq_reco_emcEvent
14812257 14 emb_reco_geant
12115661 11 scaler
884333 0 daq_reco_hist

Note that the average size for an event file is 284MB while for a MicroDST, the size average is 84 MB so a ratio of 3. The number of files per catriges is at best 1.2 files per cartriges with peaks at 10 or more. THis is mostly due to a request profile dominated by Xrootd "random" pattern and a few user's requests. According to the previous study, the IO efficiency should be around 8% for 84 MB file and reaching perhaps 20-25% efficiency for a 284 MB average file class. Cosndiering we have not studied the drive access pattern beyond a simple scaling (i.e. we will ignore to first order the fcat we have many drives at our disposal), we should see a performance change change from 8 to 20 so an improvement of x2.5-3 shall the file size argument stand.

In order to observe the IO performance when small or big files are being requested, we requested event files to the DataCarousel and produced the below two graphs for the dates ranging between 2006/10/26 and 2006/10/29. The first graph represents the IO "before" the massive submission of event file dominated requests, the second the graph "after" the event. The graphs are preliminary (work in progress).
DC IOPerf 20061027 (time before)
DC IOPerf 20061029 (time after)
We observe an average transfer rate at best saturating at 15 MB/sec for MuDST dominated files and an average close to 50 MB/sec for event files. The ratio is ~ 3 which remains consistent with the results on HPSS IO efficiency per file size and per and our initial rough estimate.

Note: It is interresting to note that a significant mix of very small files (below 12 MB average) would bring the performance to a sub 1% efficiency. The net result for 9 drives (as we have in STAR) would be an aggregate performance no better than 3MB/sec for a 9940B x 9 drive configuration . We observe periods with such poor performance. The second observation is that even with MuDST dominated files only, we would not be able to exceed in speed 70% of the performance of one drive so at best, 21 MB/sec (this correspond to our current "best hour"). The results are coherent to first order.

MSS failures and cascading effects

A poor MSS IO efficiency is one thing, but stability is another. Under poor performance situations, failures are critical to minimize. We have already stated that the DC is error resilient. However, during failure periods, the request queue accumulates requests and whenever the mass storage comes back, all requests are suddenly released, opening the flood gate of IO ... which are not much of a flood than a drip. As a consequence, user's requests or bulk transfer would not suffer much but modes requiring immediate response (such as Xrootd) would become largely impacted. In fact, shall the downtime be long enough, it is likely that all requests occuring while the errors started will fail but subsequent accumulated requests will also cause further delays and spurious Xrootd failures - Xrootd will timeout if the DC has not satisfied its request for 3600 secondes i.e. 1 hour per file. The following graph shows an error sequence:
HPSS Error types 2006 09 W36
While there are errors at the early stage of this graph, reminissent of previous failures, problems during the focused time period starts around 9 AM with a MetaData lookup failure (cannot get lock after 5 retries). Subsequentely, there is an immediate appearance of files failing to be restored for more than an hour and this continues up to around 10:30 to 11:00 at ehich point, more errors occur from the Mass Storage system (a mix of MetaData lookup failure and massive authentication failures). The authentication failures are related to the DCE component failure, causing periodic problems. In our case, we immediately the light blue band continuing for up to 15:00 (3 PM) followed but yet again a massive meta data failure. All of those cacading failures would, for a period of no less that 6 hours long, affect users requesting files from Xrootd which would needto be restored from MSS.

From all failures, the relative proprtion for that day is displayed below
HPSS Error relative proportion, 2006-09
Only one error in this graph (a DataCarousel connection failue) cna be fixed from a STAR stand point, allother occurances being a facility issue to resolve.

Miscellaneous technology considerations

All considerations in this sections are beyond our control and a facility work and optmization.

HPSS disk cache optimizations

This section seems rather academic considering the previous sections improvement perspectives.

COS and PVR optimizations

In this section, we will discuss optimizing based on file size, perhaps isolated by PVR or COS. This will be possible in future run but would lead to a massive repackaging of files and data for the past years.


Appendix

Further reading:



HSI


This is an highlight of the HSI features. Please, visit the HSI Home Page for more information.

HSI is a friendly interface for users of the High Performance Storage System (HPSS). It is intended to provide a familiar Unix-style environment for working within the HPSS environment, while automatically taking advantage of the power of HPSS (e.g. for high speed parallel file transfers) without requiring any special user interaction, where possible.

HSI requires one of two authentication methods (see the HSI User Guide for more information):
  • Kerberos (the preferred method)
  • DCE keytab (using a keytab file generated for you by the HPSS system administrators)
HSI's features include:
  • Familiar Unix-style command interface, with commands such as "LS", "CD", etc.
  • Interactive, batch, or "one-liner" execution modes
  • Ability to interactively pipe data into or out of HPSS, using filters such as "TAR"
  • Recursive option is available for most commands; including the ability to copy an entire directory tree to or from HPSS with a single simple command
  • Conditional put and get operations, including ability to update based on file timestamps
  • Automatically uses HPSS parallel I/O features for file transfer operations
  • Multi-threaded I/O within a single process space
  • Command aliases and abbreviations
  • 10 working directories
  • Ability to read command input from a file, and write log or command output to a file.
  • Non-DCE version runs on most major Unix-based platforms
  • Non-DCE version provides the ability to connect to multiple HPSS systems and perform 3rd-party copies between the systems, using a "virtual drive" path notation.

HTAR

To use htar within the HPSS environment, users are required to have the valid Kerberos credentials.

The following is the man page of how to use htar.

   
                     NAME
                          htar - HPSS tar utility
     
   
                     PURPOSE
                          Manipulates HPSS-resident tar-format archives.


                     SYNOPSIS
                          htar  -{c|t|x|X}  -f Archive [-?]  [-B] [-E]  [-L  inputlist] [-h]  [-m] [-o]
                                 [-d  debuglevel] [-p] [-v]  [-V] [-w]
                                 [-I  {IndexFile | .suffix}] [-Y  [Archive COS ID][:Index File COS ID]]
                                 [-S  Bufsize] [-T  Max Threads] [Filespec | Directory ...]

                     DESCRIPTION
                          htar  is a utility which manipulates HPSS-resident archives
                          by writing files to,  or retrieving files from the High
                          Performance Storage System (HPSS).  Files written to HPSS
                          are in the POSIX 1003.1 "tar" format, and may be retrieved
                          from HPSS, or read by native tar programs.

                          For those unfamiliar with HPSS, an introduction can be found
                          on the web at
                                 http://www.sdsc.edu/hpss

                          The local files used by the htar command are represented by
                          the Filespec parameter. If the Filespec parameter refers to
                          a directory, then that directory, and, recursively, all
                          files and directories within it, are referenced as well.

                          Unlike the standard Unix "tar" command, there is no default
                          archive device; the "-f Archive" flag is required.

                     Archive and Member files
                          Throughout the htar documentation, the term "archive file"
                          is used to refer to the tar-format  file, which is named by
                          the "-f filename" command line option. The term "member
                          file" is used to refer to individual files contained within
                          the archive file.

                     WHY USE HTAR
                          htar has been optimized for creation of archive files
                          directly in HPSS, without having to go through the
                          intermediate step of first creating the archive file on
                          local disk storage, and then copying the archive file to
                          HPSS via some other process such as ftp or hsi. The program
                          uses multiple threads and a sophisticated buffering scheme
                          in order to package member files into in-memory buffers,
                          while making use of the high-speed network striping
                          capabilities of HPSS.

                          In most cases, it will be signficantly  faster to use htar
                          to create a tar file in HPSS than to either create a local
                          tar file and then copy it to HPSS, or to use tar piped into
                          ftp (or hsi) to create the tar file directly in HPSS.

                          In addition, htar creates a separate index file, (see next
                          section) which contains the names and locations of all of
                          the  member files in the archive (tar) file.  Individual
                          files and directories in the archive can be randomly
                          retrieved without having to read through the archive file.
                          Because the index file is usually smaller than the archive
                          file, it is possible that the index file may reside in HPSS
                          disk cache  even though the archive file has been moved
                          offline to tape; since htar uses the index file for listing
                          operations, it may be possible to list the contents of the
                          archive file without having to incur the time delays of
                          reading the archive file back onto disk cache from tape.

                          It is also possible to create an index file for a tar file
                          that was not originally created by htar.

                     HTAR Index File
                          As part of the process of creating an archive file on HPSS,
                          htar also creates an index file, which is a directory of the
                          files contained in the archive. The Index File includes the
                          position of member files within the archive, so that files
                          and/or directories can be randomly retrieved from the
                          archive without having to read through it sequentially.  The
                          index file is usually significantly smaller in size than the
                          archive file, and may often reside in HPSS disk cache even
                          though the archive file resides on tape. All htar operations
                          make use of an index file.

                          It is also possible to create an index file for an archive
                          file that was not created by htar, by using the "Build
                          Index" [-X] function (see below).

                          By default, the index filename is created by adding ".idx"
                          as a suffix to the Archive name specified by the -f
                          parameter.  A different suffix or index filename may be
                          specified by the "-I " option, as described below.

                          By default, the Index File is assumed to reside in the same
                          directory as the Archive File.  This can be changed by
                          specifying a relative or absolute pathname via the -I
                          option. The Index file's relative pathname is relative to
                          the Archive File directory unless an absolute pathname is
                          specified.

                     HTAR Consistency File
                          HTAR writes an extra file as the last member file of each
                          Archive, with a name similar to:

                                  /tmp/HTAR_CF_CHK_64474_982644481

                          This file is used to verify the consistency of the Archive
                          File and the Index File.  Unless the file is explicitly
                          specified, HTAR does not extract this file from the Archive
                          when the -x action is selected.  The file is listed,
                          however, when the -t action is selected.

                     Tar File Restrictions
                          When specifying path names that are greater than 100
                          characters for a file (POSIX 1003.1 USTAR) format, remember
                          that the path name is composed of a prefix bufferFR, a /
                          (slash), and a name buffer.

                          The prefix buffer can be a maximum of 155 bytes and the name
                          buffer can hold a maximum of 100 bytes. Since some
                          implementations of TAR require the prefix and name buffers
                          to terminate with a null (' ') character, htar enforces the
                          restriction that the effective prefix buffer length is 154
                          characters (+ trailing zero byte), and the name buffer
                          length is 99 bytes (+ trailing zero byte). If the path name
                          cannot be split into these two parts by a slash, it cannot
                          be archived. This limitation is due to the structure of the
                          tar archive headers, and must be maintained for compliance
                          with standards and backwards compatibility. In addition, the
                          length of a destination for a hard or symbolic link ( the
                          'link name') cannot exceed 100 bytes (99 characters + zero-
                          byte terminator).

                     HPSS Default Directories
                          The default directory for the Archive file is the HPSS home
                          directory for the DCE user.  An absolute or relative HPSS
                          path can optionally be specified for either the Archive file
                          or the Index file. By default, the Index file is created in
                          the same HPSS directory as the Archive file.

                     Use of Absolute Pathnames
                          Although htar does not restrict the use of absolute
                          pathnames (pathnames that begin with a leading "/") when the
                          archive is created, it will remove the leading / when files
                          are extracted from the archive.  All extracted files use
                          pathnames that are relative to the current working
                          directory.

                     HTAR USAGE
                          Two groups of flags exist for the htar command; "action"
                          flags and "optional" flags. Action flags specify the
                          operation to be performed by the htar command, and are
                          specified by one of the following:

                          -c, -t, -x, -X

                          One action flag must be selected in order for the htar
                          command to perform any useful function.

                     File specification (Filespec)
                          A file specification has one of the following forms:

                                  WildcardPath
                                     or
                                  Pathname
                                     or
                                  Filename

                          WildcardPath is a path specification that includes standard
                          filename pattern-matching characters, as specified for the
                          shell that is being used to invoke htar.  The pattern-
                          matching characters are expanded by the shell and passed to
                          htar as command line arguments.

                     Action Flags
                          Action flags defined for htar are as follows:

                          -c   Creates a new HPSS-resident archive, and writes the
                               local files specified by one or more File parameters
                               into the archive. Warning: any pre-existing archive file
                               will be overwritten without prompting. This behavior
                               mimics that of the AIX tar utility.

                          -t   Lists the files in the order in which they appear in
                               the HPSS- resident archive.   Listable output is
                               written to standard output; all other output is written
                               to standard error.

                          -x   Extracts the files specified by one or more File
                               parameters from the HPSS-resident archive. If the File
                               parameter refers to a directory, the htar command
                               recursively extracts that directory and all of its
                               subdirectories from the archive.

                               If the File parameter is not specified, htar extracts
                               all of the files from the archive. If an archive
                               contains  multiple copies of the same file, the last
                               copy extracted overwrites  all previously extracted
                               copies. If the file being extracted does not already
                               exist on the system, it is created. If you have the
                               proper permissions, then htar command restores all
                               files and directories with the same owner and group IDs
                               as they have on the HPSS tar file. If you  do not have
                               the proper permissions, then files and directories are
                               restored with your owner and group IDs.

                          -X   builds a new index file by reading the entire tar file.
                               This operation is used either to reconstruct an index
                               for tar files whose Index File is unavailable (e.g.,
                               accidentally deleted), or for tar files that were not
                               originally created by htar.

                     Options
                          -?   Displays htar's verbose help

                          -B   Displays block numbers as part of the listing (-t
                               option). This is normally used only for debugging.

                          -d debuglevel
                               Sets debug level (0 - N) for htar. 0 disables debug, 1
                               - n enable progressively higher levels of debug output.
                               5 is the highest level; anything > 5 is silently mapped
                               to 5.  0 is the default debug level.

                          -E   If present, specifies that a local file should be used
                               for the file specified by the "-f Archive" option.  If
                               not specified, then the archive file will reside in
                               HPSS.

                          -f Archive
                               Uses Archive as the name of archive to be read or
                               written. Note: This is a required parameter for htar,
                               unlike the standard tar utility, which uses a built-in
                               default name.

                               If the Archive variable specified is - (minus sign),
                               the tar command writes to standard output or reads from
                               standard input. If you write to standard output, the -I
                               option is mandatory, in order to specify an Index File,
                               which is copied to HPSS if the Archive file is
                               successfully written to standard output. [Note: this
                               behavior is deferred - reading from or writing to pipes
                               is not supported in the initial version of htar].

                          -h   Forces the htar command to follow symbolic links as if
                               they were normal files or directories. Normally, the
                               tar command does not follow symbolic links.

                          -I index_name
                               Specifies the index file name or suffix.  If the first
                               character of the index_name is a period, then
                               index_name is appended to the Archive name, e.g. "-f
                               the_htar -I .xdnx" would create an index file called
                               "the_htar.xndx".  If the first character is not a
                               period, then index_name is treated as a relative
                               pathname for the index file (relative to the Archive
                               file directory) if the pathname does not start with
                               "/", or an absolute pathname otherwise.

                               The default directory for the Index file is the same as
                               for the Archive file.  If a relative Index file
                               pathname is specifed, then it is appended to the
                               directory path for the Archive file.  For example, if
                               the Archive file resides in HPSS in the directory
                               "projects/prj/files.tar", then an Index file
                               specification of "-I projects/prj/files.old.idx" would
                               fail, because htar would look for the file in the
                               directory "projects/prj/projects/prj".  The correct
                               specification in this case is "-I files.old.idx".

                          -L InputList
                               Writes the files and directories listed in the
                               "InputList" file to the archive. Directories named in
                               the InputList file are not treated recursively. For
                               directory names contained in the InputList file, the
                               tar command writes only the directory entry to the
                               archive, not the files and subdirectories rooted in the
                               directory.  Note that "home directory" notation ("~")
                               is not expanded for pathnames contained in the
                               InputList file, nor are wildcard characters, such as
                               "*" and "?".

                          -m   Uses the time of extraction as the modification time.
                               The default is to preserve the modification time of the
                               files. Note that the modification time of directories
                               is not guaranteed to be preserved, since the operating
                               system may change the timestamp as the directory
                               contents are changed by extracting other files and/or
                               directories.  htar will explicitly set the timestamp on
                               directories that it extracts from the Archive, but not
                               on intermediate directories that are created during the
                               process of extracting files.

                          -o   Provides backwards compatibility with older versions
                               (non-AIX) of the tar command. When this flag is used
                               for reading, it causes the extracted file to take on
                               the User and Group ID (UID and GID) of the user running
                               the program, rather than those on the archive.  This is
                               the default behavior for the ordinary user. If htar is
                               being run as root, use of this option causes files to
                               be owned by root rather than the original user.

                          -p   Says to restore fields to their original modes,
                               ignoring the present umask. The setuid, setgid, and
                               tacky bit permissions are also restored to the user
                               with root user authority.

                          -S bufsize
                               Specifies the buffer size to use when reading or
                               writing the HPSS tar file.  The buffer size can be
                               specified as a value, or as kilobytes by appending any
                               of  "k","K","kb", or "KB" to the value.  It can also be
                               specified as megabytes by appending any of  "m" or "M"
                               or "mb" or "MB" to the value, for example, 23mb.

                          -T max_threads
                               Specifies the maximum number of threads to use when
                               copying local member files to the Archive file.  The
                               default is defined when htar is built; the release
                               value is 20.  The maximum number of threads actually
                               used is dependent upon the local file sizes, and the
                               size of the I/O buffers.  A good approximation is
                               usually

                                  buffer size/average file size

                               If the -v or -V option is specified, then the maximum
                               number of local file threads  used while writing the
                               Archive file to HPSS is displayed when the transfer is
                               complete.

                          -V   "Slightly verbose" mode. If selected, file transfer
                               progress will be displayed in interactive mode. This
                               option should normally not be selected if verbose (-v)
                               mode is enabled, as the outputs for the two different
                               options are generated by separate threads, and may be
                               intermixed on the output.

                          -v   "Verbose" mode. For each file processed, displays a
                               one-character operation flag, and lists the name of
                               each file. The flag values displayed are:
                                   "a"  - file was added to the archive
                                   "x"  - file was extracted from the archive
                                   "i"  - index file entry was created (Build Index
                               operation)

                          -w   Displays the action to be taken, followed by the file
                               name, and then waits for user confirmation. If the
                               response is affirmative, the action is performed. If
                               the response is not affirmative, the file is ignored.

                          -Y auto | [Archive CosID][:IndexCosID]
                               Specifies the HPSS Class of Service ID to use when
                               creating a new Archive and/or Index file. If the
                               keyword auto is specified, then the HPSS hints
                               mechanism is used to select the archive COS, based upon
                               the file size.  If -Y cosID  is specified, then cosID
                               is the numeric COS ID to be used for the Archive File.

                               If -Y :IndexCosID is specified, then IndexCosID is the
                               numeric COS ID to be  used for the Index File.  If both
                               COS IDs are specified, the entire parameter must be
                               specified as a single string with no embedded spaces,
                               e.g. "-Y 40:30".

                     HTAR Memory Restrictions
                          When writing to an HPSS archive, the htar command uses a
                          temporary file (normally in /tmp) and maintains in memory a
                          table of files; you receive an error message if htar cannot
                          create the temporary file, or if there is not enough memory
                          available to hold the internal tables.

                     HTAR Environment
                          HTAR should be compiled and run within a non-DCE HPSS environment.

                     Miscellaneous Notes:
                          1. The maximum size of a single Member file within the
                          Archive is approximately 8 GB, due to restrictions in the
                          format of the tar header.  HTAR does not impose any
                          restriction on the total size of the Archive File when it is
                          written to HPSS; however, space quotas or other system
                          restrictions may limit the size of the Archive File when it
                          is written to a local file (-E option).

                          2.  HTAR will optionally write to a local file; however, it
                          will not write to any file type except "regular files".  In
                          particular, it is not suitable for writing to magnetic tape.
                          To write to a magnetic tape device, use the "tar" or "cpio"
                          utility.

                     Exit Status
                          This command returns the following exit values:

                          0       Successful completion.

                          >0      An error occurred.

                     Examples
                          1.   To write the file1 and file2 files to a new archive
                               called "files.tar" in the current HPSS home directory,
                               enter:

                                      htar -cf files.tar file1 file2

                          2.   To extract all files from the project1/src directory in
                               the Archive file called proj1.tar, and use the time of
                               extraction as the modification time,  enter:

                                     htar -xm -f proj1.tar project1/src

                          3.   To display the names of the files in the out.tar
                               archive file within the HPSS home directory, enter:

                                     htar -tvf out.tar

                     Related Information
                          For file archivers: the cat command, dd command, pax
                          command.  For HPSS file transfer programs: pftp, nft, hsi

                          File Systems Overview for System Management in AIX Version 4
                          System Management Guide: Operating System and Devices
                          explains file system types, management, structure, and
                          maintenance.

                          Directory Overview in AIX Version 4 Files Reference explains
                          working with directories and path names.

                          Files Overview in AIX Version 4 System User's Guide:
                          Operating System and Devices provides information on working
                          with files.

                          HPSS web site at http://www.sdsc.edu/hpss

                     Bugs and Limitations:
                          - There is no way to specify relative Index file pathnames
                          that are not rooted in the Archive file directory without
                          specifying an absolute path.

                          - The initial implementation of HTAR does not provide the
                          ability to append, update or remove files.  These features,
                          and others, are planned enhancements for future versions.

Home directories and other areas backups

Home directories

If you accidently erase a file in your home directoy at RFC, you can restore it using a two week backup that you can access directly. Two weeks worth of backups are kept as snapshots. The way it works is that as day pass, live backups are being made on the file system itself hence preserving your files in-place.

For example, your username is 123, your home directory is /star/u/123 and you erased a file /star/u/123/somedir/importantfile.txt and now realise that was a mistake. Don't panic. This is not the end of thw world as snapshot backups exist.

Simply look under /star/u/.snapshot

The directory names are odered by the date and time of backup. Pick a date when the file existed and under there is a copy of your home directory from that day. From here you can restore the file, i.e,

% cp /star/u/.snapshot/20yy-mm-dd_hhxx-mmxx.Daily_Backups_STAR-FS05/123/somedir/importantfile.txt 
/star/u/123/somedir/importantfile.txt

See also starsofi #7363.

AFS areas

Each doc_proected/ AFS areas also have a .backup volume which keeps recently deleted files in that directory until a real AFS based backup is made (then the content is deleted and you will need to ask the RCF to restore your files).  Finding it is tricky though because there is one such directory per volume. The best is to backward search for that directory. For example, let's suppose you are working in /afs/rhic.bnl.gov/star/doc_protected/www/bulkcorr/. If you bacward search for a .backup directory, you will find one as /afs/rhic.bnl.gov/star/doc_protected/www/bulkcorr/../.backup/ and this is where the files for this AFS volume will go upon deletion.

Other areas

Other areas are typically not backed-up.

 

Hypernews

Most Hypernews forum will have to be retired - please consult the list of mailing lists at this link to be sure you need HN at all.
While our Web serve ris down, many Computing related discussions are now happening on Mattermost Chat (later, will be Mail based by popular demand). Please log there using the 'BNL login' option (providing a facility wide unified login) and use your RACF/SDCC kerberos credentials to get in. If you are a STAR user, you will automatically be moved to the "STAR Team".

Please, read the Hypernews in STAR section before registering a new account (you may otherwise miss a few STAR specificities and constraints).

General Information

HyperNews is a cross between the hypermedia of the WWW and Usenet News. Readers can browse postings written by other people and reply to those messages. A forum (also called a base article) holds a tree of these messages, displayed as an indented outline that shows how the messages are related (i.e. all replies to a message are listed under it and indented).

Users can become members of HyperNews or subscribe to a forum in order to get Email whenever a message is posted, so they don't have to check if anything new has been added. A recipient can then send a reply email back to HyperNews, rather than finding a browser to write a reply, and HyperNews then places the message in the appropriate forum.

Hypernews in STAR

In STAR, there are a few specificities with Hypernews as listed below. 

  • Your Hypernews account should match your BNL/RCF account by name. This account must be part of the STAR group. For example, if you have a RCF STAR account named 'abc', you should create an Hypernews account named 'abc'.  Any other account will be removed automatically. Note that if you have any other RCF unix account but not a STAR account, the result will be the same (you will not be able to register to STAR's Hypernews). This is done so automation of account approval can be achieved while complying with the DOE requirement mentioned in Getting a computer account in STAR. If you are a STAR user in good standing, the automation especially allows for immediate use of your account without further approval process.
  • You should NOT use the same password for an Hypernews account as your RCF account. Hypernews has a weak authentication method and while physical access to the machine is needed to crack it, a focus on keeping the password different from the interactive login password(s) is important. In general, Web-based password authentication should  not be the same than interactive account passwords. 
  • Hypernews does not accept Email attachments. This includes Emails containing a mix of text and html - they will be rejected by the system. Please, be aware that whenever you send "formatted" Emails (bold character, font changes etc...), your Email client does NOTHING ELSE than sending the content in two parts: one part is plain text, the second an attached HTML. Hence, Hypernews will NOT process formatting (but will take your Email anyhow).
  • Hypernews posting DO NOT need to be done from the Web interface (this is true for ANY Hypernews systems); you can send an Email directly to the list address. However, posting must have a subject. Subject-less posting will be rejected. Also, we have a spam filter in place and it is noteworthy to mention that to date, we had no accidental rejections of valid Emails.
  • As per 2012/06, all STAR Hypernews fora were made protected. In other words, and in addition of your Hypernews personal account, you MUST use the 'protected' password to access the Web interface.

Startup links

Here are a few startup links and tips, including where to start for a new Hypernews account.

  • If you DO NOT have a STAR account, consult Getting a computer account in STAR first BUT you will STILL need the additional below information:
    • You will need the famous “protected” area password. If you do not understand what this means, you are probably not a STAR collaborator ... Otherwise, you can get this information from your PAC, PWGC, OPS manager, council rep, etc ...
    • For your RCF user account name, you will need to chose a User ID other than “protected”. Hopefully, this will be the case.
  • After you get a RCF Unix account
    • create an Hypernews account  (as indicated in the Hypernews in STAR section) starting from here.
    • IMPORTANT NOTE: please wait at least one hour after you get  confirmation from the RCF before creating your Hypernews account as there is a delay in information propagation of accounts to the Hypernews system.
  • To connect to the Web based Hypernews interface, please login to the system first. This will allow for your session to be authenticated properly and postings to be identified as you. As per 2006, any anonymous posting will be rejected.
  • You can then proceed to either
    • The forum list with all Hypernews forum displayed in descending order of 'last posted'
    • You can Edit your membership to change your personal information (this is a typical link which DO require for you to login first)
    • Use the Hypernews Search engine to search / locate a particular message. This is slow and painful (we have too many message) but is te only way you can search the huge 10 years worth of Email archive.
  • Note again that as soon as you have located your forum of interest and it's address, you can send Emails directly to that list forum and/or answer previous post by using your mail client 'Reply'.

 

Tips related to message delivery to Hypernews

If you have problems sending EMail to Hypernews, please understand and verify the following before asking for help:

  • Hypernews will silently discard EMails detected as spam. This is good news for our Hypernews subscribers but be aware that Spam filtering is a tricky business and some legitimate Email may be rejected unwillingly.
    • The first and foremost reason for rejection is the use of internet service provider (ISP) EMail servers to send Emails to Hypernews. Several ISP are blacklisted as they do not protect their service against anonymous Emails. Using such ISP will have as unfortunate consequence to have your Email rejected. Be sure to use your lab or university as provider or a trusted ISP.
    • The other reason is font encoding - DO NOT use special font encoding while sending Email - Korean (EUC-KR) or Chinese (GB2312, GB18030, Big5, ...) especially gets a high mark from the Spam filter and get you closet o the rejection rating threshold. A few unfortunate words here and there and ...
  • Hypernews in STAR DOES NOT accept attachments: your Email will be silently rejected if any appears.
    • Send instead a note of where your document resides for consulting.
    • DO NOT send messages as HTML - HTML EMail actually send plain text and HTML as an attachment ... and your post will be rejected. Typically, your mail client gives you the possibility to send plain-text for entire domain matching. Hypernews is covered by the www.star.bnl.gov domain.
      • MAC Users using Mac OSX Mail client, please consult this "How to Send a Message in Plain Text" (also explaining why MIME may be dangerous). Alternatively you may want to use Mozilla/Thunderbird as a client.
      • To set this up in Thunderbird so as follow
        Select the _Tools_ menu 
           Select _Options_ a window opens. Select the tab [Composition] -then-> [General]
            click on <send option> in the new opened pannel, then select the [Plain text domain] tab
              click [add] and add star.bnl.gov
              click OK
    • Sending EMail from BNL's Exchage server will result in MIME attachements and hence, cause a rejection of your posting. Two possible solutions offers themselves
      • Use the Hypernews Web interafce to send messages (after making sure you are logged in, click on the bottom reference to go to the message and [Add message]
      • Use a tool like Thunderbird with the SMTP outgoing server set to use the RCF server. Instructions are available here.
         
  • The folowing  restrictions  apply:
    • Use only one Hypernews forum in the To: field (and do not use CC: to another HN forum) - HN will not know where to post if you use multiple fora and the result will be un-predictable (depending on syntax used for the To: field and mailer, the post will end up in one of the specified fora or be discarded entirely)
    • You will NOT be able to forward a post from one forum to another - HN will know and send the message again to the original forum. This is because the information HN keeps for archiving your posts and threading them is part of the message header and not based on where you send the message (header includes Newsgroups, X-HN-Forum, X-HN-Re and X-HN-Loop). Your options could be to strip the header or cut-and-paste the original message into a new one.
    • Multiple recipient on the Email "To:" field will not post your EMail. Strictly speaking, this is a shortcoming in the parsing of the header as defined in RFC2822 (the RFC allows for a list, STAR HN implementation disallow mass posting).
  • One frequent source of issues and unrelated to any of te above (and very STAR infrastructure specific):
    • Always use address book entries using the alias of the form list [at] www.star.bnl.gov and NOT the node specific address (connery, orion, etc...). Especially, older users should remove from their address book any address not specifying the alias.
  • If you need to test sending Email to the system, please do not spam an existing active list - instead, use our test fora: startest or tesp.
    • Remember that Hypernews is a centralized system, if your Email passes and is deleiverred to the test forum, it should be to any other lists
    • Both fora are near identical - testp is nowadays used for testing new code and features so for a casual Email check, you may prefer startest.

 

Rejection rules

The content of this page should NOT be visible to users outside of STAR.

This page documents the base rejection rules. Hypernews in STAR will reject Emails upon one or more of the below conditions (all rules are case insensitive wherever it applies).

Early rejection rules (Email content analysized)

  • A post will be rejected if the subject contains one of the following
    • the string {Spam? or {Virus?  appears as set by BNL Mail gatekeeper upon detection of a spam or virus
    • the string returned mail or autoreply appears anywhere in the subject. This is added fake auto-replies to end as a posting
    • the pattern \w+\s+wrote:\s* appears in the subject (this was added based on a popular spam subject)
  • The following senders will be rejected automatically: postmaster, mailer-daemon, symantec
  • The Email body contains an HTML preamble of the form DOCTYPE HTML  - this rejects tones of spam attempting to send HTML formatted content with embedded links or code, leveraging mail readers ability to auto-expand (even without a MIME attached HTML preamble)

Posting rejection rules - Further analysis of a content will be made for attachments and special formatting and rejection applies if one or more of the below cases appear

  • Content-type text/html of the DOCTYPE HTML ; this is to forbid HTML based spam (note this rejection is redundant with an earlier stage rejection)
  • Apple-Mail in quoted-printable text/html format
  • Attachments containing any application or any octet-stream format
  • Encoded content in base64 (as specified by content transfer encoding)

 

 

Installing the STAR software stack

The pages here are under constructions. They aim to help remote sites to install the STAR software stack.

You should read first Setting up your computing environment before going through the documents provided herein as we refer to this page very often. Please, pay particular attention to the list of environment variables defined by the group login script and their meanings in STAR. Be aware of the assumptions as per the software locations (all will be referred by the environment variables listed there) as well as the need to use a custom (provided) set of .cshrc and .login file (you may have to modify them if you install the STAR software locally). Setting up your computing environment  is however NOT written as a software installation instruction and should not be read as such.

Please, follow the instructions in order they appear below

  1. Check first the availability of the CERN libraries as this may be a show stopper. If there is no CERN libraries for your OS version and/or the available libraries are not validated for your OS, you will NOT be able to get the STAR software working on your site.
  2. Your FIRST STEP is to install the Group login scripts. Although not all will be defined, the login should be successful after this step.
  3. The next step is then to install Additional software components
    However, your OS should also have installed a few base system wide RPMs.
    Lists are available on the OS Upgrade page as well as specific issues with some OS. Read it carefully.
  4. Then, install the ROOT library Building ROOT in STAR
  5. Finally, you are ready for a STAR library installation STAR codes

Sparse notes are also in Post installation and special instructions for administrators at OS Upgrade.

 

Group login scripts

Installing

The STAR general group login scripts are necessary to define the STAR environment. They reside in $CVSROOT within the group/ sub-tree. Template files for users .cshrc and .login support also exists within this tree in a sub-directory group/templates. To install properly on a local cluster, there are two possibilities:

  • if you have access to AFS, you should simply
        % mkdir  /usr/local/star # this is only an example
        % cd /usr/local/star     # this directory needs to be readable by a STAR group
        % cvs checkout group     # this assumes CVSROOT is defined 
    This will bring a copy of all you need locally in /usr/local/star/group
  • If you do not have access to AFS from your remote site, get a copy of the entire BNL $GROUP_DIR tree and unpack in a common place (like /usr/local/star above). A copy resides in the AFS tree mentioned Additional software components.

Note that wherever you install the login scripts, they need to be readable by a STAR members (you can do this by allowing a Unix group all STAR users will belong to read access to the tree or by making sure the scripts are all users accessible).

Also, as soon as you get a local copy of the group/templates/ files, EDIT BOTH the cshrc and login files and change on top the definition of GROUP_DIR to it matches your site GROUP script location (/usr/local/group in our example).

To enable a user to use the STAR environment, simply copy the template cshrc and login scripts as indicated in Setting up your computing environment.

Special scripts

Part of our login is optional and the scripts mentioned here will NOT be part of our CVS repository but, if exists, will be executed.

  • site_pre_setup.csh - this script, if exists in $GROUP_DIR, will be executed before the execution of the STAR standard login. Its purpose is to define variables indicating non-standard location for your packages. For those variables which may be redefined, please consult Setting up your computing environment for all the variables (in blue) which may be redefined prior to login.
  • site_post_setup.csh - this script, if exists in $GROUP_DIR, will be executed after the STAR standard login. Its purpose is to define local variables nor related to STAR's environment. Such variables may be for example the definition of a proxy (http_proxy, ftp_proxy, https_proxy), a NNTP server or a default WWW home directory (WW_HOME). Do not try to redefine STAR login's defined variables using this script.

Testing this phase

Testing this phase is as simple as creating a test account and verifying that the login does succeed. Whenever you start with a blank site, the login MUST succeed and lead to viable environment ($PATH especially should be minimally correct). A typical login example would be at this stage something like

Setting up WWW_HOME  = http://www.star.bnl.gov/

         ----- STAR Group Login from /usr/local/star/group/ -----

Setting up STAR_ROOT = /usr/local/star
Setting up STAR_PATH = /usr/local/star/packages
Setting up OPTSTAR   = /usr/local/star/opt/star
WARNING : XOPTSTAR points to /dev/null (no AFS area for it)
Setting up STAF      = /usr/local/star/packages/StAF/pro
Setting up STAF_LIB  = /usr/local/star/packages/StAF/pro/.cos46_gcc346/lib
Setting up STAF_BIN  = /usr/local/star/packages/StAF/pro/.cos46_gcc346/bin
Setting up STAR      = /usr/local/star/packages/pro
Setting up STAR_LIB  = /usr/local/star/packages/pro/.cos46_gcc346/lib
Setting up STAR_BIN  = /usr/local/star/packages/pro/.cos46_gcc346/bin
Setting up STAR_PAMS = /usr/local/star/packages/pro/pams
Setting up STAR_DATA = /usr/local/star/data
Setting up CVSROOT   = /usr/local/star/packages/repository
Setting up ROOT_LEVEL= 5.12.00
Setting up SCRATCH   = /tmp/jeromel
CERNLIB version pro has been initiated with CERN_ROOT=/cernlib/pro
STAR setup on star.phys.pusan.ac.kr by Tue Mar 12 06:43:47 KST 2002  has been completed
LD_LIBRARY_PATH = .cos46_gcc346/lib:/usr/local/star/ROOT/5.12.00/.cos46_gcc346/rootdeb/lib:ROOT:/usr/lib/qt-3.3/lib

 

Suggestions

STAR group

You may want to to create a rhstar group on your local cluster matching GID 31012. This will make AFS integration easier as the group names in AFS will then translate to rhstar (it will however not grant you any special access obviously since AFS is Kerberos authentication based and not Unix UID based).
To do this, and after checking that /etc/group do not contain any mapping for gid 31012, you could (Linux):

% groupadd -g 31012 rhstar

Test account

It may be practical for testing the STAR environment to create a test account on your local cluster. The starlib account is an account  used in STAR for software installation. You may want to create such account as follow (Linux):

% useradd -d /home/starlib -g rhstar -s /bin/tcsh  starlib

 This will allow for easier integration. Any account name will do (but testing is important and we will have a section on this later).

 

 

Additional software components

Scope & audience

Described in Setting up your computing environment, OPTSTAR is the environment variable pointing to an area which will supplement the operating system installation of libraries and program. This area is fundamental to the STAR software installation as it will contain needed libraries, approved software component version, shared files, configuration and so on.

The following path should contain all software components as sources for you to install a fresh copy on your cluster:
    /afs/rhic.bnl.gov/star/common

Note that this path should allow anyuser to read so there is no need for an AFS token. The note below are sparse and ONLY indicate special instructions you need to follow if any. In the absence of special instructions, the "standard" instructions are to be followed. None of the explanations below are aimed to target a regular user but aimed to target system administrator or software infrastructure personnel.

System wide RPMs

Some RPMs from your OS distribution may be found at BNL under the directory /afs/rhic.bnl.gov/rcfsl/X.Y/*/ where X.Y is the major and minor version for your Scientific Linux version respectively. You should have a look and install. If you do not have AFS, you should log to the RCF and transfer whatever is appropriate.

In other words, we may have re-packaged some packages and/or created additional ones for compatibility purposes. An example of this for SL5.3 is flex32libs-2.5.4a-41.fc6.i386.rpm located in /afs/rhic.bnl.gov/rcfsl/5.3/rcf-custom/ which supports the 32 bits compatbility package for flex on a kernel with dual 32/64 bits support.

STAR Specific

The directory tree /afs/rhic.bnl.gov/star/common contains packages installed on our farm in addition of the default distribution software packages coming with the operating system. At BNL, all packages referred here are installed in the AFS tree

	/opt/star -> /afs/rhic.bnl.gov/@sys/opt/star/

Be aware of the intent explained in Setting up your computing environment as per the difference between $XOPTSTAR and OPTSTAR.

OPTSTAR will either

  • at BNL or to a remote site: be used to indicate and access the local software BUT may be supported through a soft-link to the same AFS area as showed above whereas @sys will expand to the operating system of interest (see Setting up your computing environment as well for a support matrix)
  • at a remote site, will point to a LOCAL (that is, non-networked) installation of the software components. This space could be anywhere on your local cluster but obviously, will have to be shared and visible from all nodes in your cluster.

XOPTSTAR

The emergence of $XOPTSTAR started from 2003 to provide better support for software installation to remote institutions. Many packages add path information to their configuration (like the infamous .la files) and previously installed in $OPTSTAR, remote sites had problems loading libraries for a path reason. Hence, and unless specified otherwise below, $XOPTSTAR will be used preferably at BNL for installation the software so remote access to the AFS repository and copy will be made maximally transparent.

In 2005, we added an additional tree level reflecting the possibility of multiple compilers and possible mismatch between fs sysname setups and operating system versions. Hence, you may see path like OPTSTAR=/opt/star/sl44_gcc346 but this structure is a detail and if the additional layer does not exist for your site, later login will nonetheless succeed. This additional level is defined by the STAR login environment $STAR_HOST_SYS. In the next section, we explained how to set this up from a "blank" site (i.e. a site without the STAR environment and software installed).

On remote sites where you decide to install the software components locally, you should use $OPTSTAR in the configure or make statements.

Basic starting point

From a blank node on remote site, be sure to have $OPTSTAR defined. You can do this by hand for example like this

% setenv OPTSTAR /usr/local

or

% mkdir -p /opt/star
% setenv OPTSTAR /opt/star

are two possibilities. The second, being the default location of the software layer, will be automatically recognized by the STAR group login scripts. From this point, a few pre-requisites are

  • you have to have a system with "a" compiler - we support gcc but also icc on Linux
  • you should have the STAR group login scripts at hand (it could be from AFS). The STAR login scripts will NOT redefine $OPTSTAR if already defined.

Execute the STAR login. This will define $STAR_HOST_SYS appropriately. Then

% cd $OPTSTAR
% mkdir $STAR_HOST_SYS
% stardev
 

the definition of $OPTSTAR will change to the version dependent structure, adding $STAR_HOST_SYS to the path definition (the simple presence of the layer made the login script redefine it).

 

Changing platform or compiler

32 bits versus 64 bits

If you want to support native 64 bits on 64 bits, do not forget to pass/force -m64 -fPIC to the compiler and -m64 to the linker. If however you want to build a cross platform (64 bit/32 bit kernels compatible) executables and libraries, you will on the contrary need to force -m32 (use -fPIC). Even if you build the packages from a 32 bit kernel node, be aware that many applications and package save a setup including compilation flags (which will have to be using -m32 if you want a cross platform package). There are many places below were I do not specify this.

Often, using CFLAGS="-m32 -fPIC" CXXFLAGS="-m32 -fPIC" LDFLAGS="-m32" would do the trick for a force to 32 bits mode (similarly for -m64). You need to use such option for libraries and packages even if you assemble on a 32 bits kernel node as otherwise, the package may build extension later not compatible as cross-platform support.

Other GCC versions

As for the 32 bits versus 64 bits, often adding something like CC=`which gcc` and CXX=`which g++` to either the configure or make command would do the trick. If not, you will need to modify the Makefile accordingly. You may also define the environment variable CC for consistency.

Summary

If ylu do have a 64 bits kernel and intend to compile both 32 bits and 64 bits, you should define the envrionment variable as shown below.  The variables will make configure (and some Makefile) pick the proper flags and make your life much easier - follow the specific instructions for the packages noted in those instructions for specific tricks. Note as well that even if you do have a 32 bits kernel only, you are encouraged to use the -m32 compilation option (this will make further integration with dual 32/64 bits support smoother as some of the packages configurations include compiler path and options).

32 bits

% setenv CFLAGS   "-m32 -fPIC"
% setenv CXXFLAGS "-m32 -fPIC"
% setenv FFLAGS   "-m32 -fPIC"
% setenv FCFLAGS  "-m32 -fPIC"
% setenv LDFLAGS  "-m32"
% setenv CC  `which gcc`     # only if you use a different compiler than the system default
% setenv CXX `which g++`     # only if you use a different compiler than the system default

and/or pass to Makefile and/or configure the arguments CFLAGS="-m32 -fPIC" CXXFLAGS="-m32 -fPIC" LDFLAGS="-m32" CC=`which gcc` CXX=`which g++` (will not hurt to use it in addition of the environment variables)

64 bits

% setenv CFLAGS   "-m64 -fPIC"
% setenv CXXFLAGS "-m64 -fPIC"
% setenv FFLAGS   "-m64 -fPIC"
% setenv FCFLAGS  "-m64 -fPIC"
% setenv LDFLAGS  "-m64"
% setenv CC  `which gcc`     # only if you use a different compiler than the system default
% setenv CXX `which g++`     # only if you use a different compiler than the system default

and/or pass to Makefile and/or configure the arguments CFLAGS="-m64 -fPIC" CXXFLAGS="-m64 -fPIC" LDFLAGS="-m64" CC=`which gcc` CXX=`which g++` (will not hurt to use it in addition of the environment variables)

 

Software repository directory - starting a build

In the instructions below, greyed instructions are historical instructions and/or package version which no longer reflects the current official STAR supported platform. However, if you try to install the STAR software under older OS, refer carefully to the instructions and package versions.

perl

The STAR envrionment and login scripts heavily rely on perl for string manipulation, compilation management and a bunch of utility scripts. Assembling it from the start is essential. You may rely on your system-wide installed perl version BUT if so, note that the minimum version indicated below IS required.

In our software repository path, you will find a perl/ sub-directory containing all packages and modules.

The package and minimal version are below
		perl-5.6.1.tar.gz      -- Moved to 5.8.0 starting from RH8
perl-5.8.0.tar.gz -- Solaris and True64 upgraded 2003
perl-5.8.4.tar.gz -- 2004, Scientific Linux
perl-5.8.9.tar.gz -- SL5+ perl-5.10.1.tar.gz -- SL6+

When building perl

  • Use all default arguments BUT when you are asked for compilation / linker args, add -m32 or -m64 depending on the platform support you are building. Those questions are (example for the 32 bits version):
    • Any additional cc flags? []  -fPIC -m32
    • Any additional ld flags (NOT including libraries)? [] -m32
    • Any special flags to pass to cc -c to compile shared library modules? []  -fPIC -m32
    • Any special flags to pass to cc to create a dynamically loaded library? [-shared -O2] -shared -O2 -m32
    • If you build a 32 bits support on a 64 bit node, you may also answer "no" below but the defalt naswer SHOULD appear as no if you properly passed -m32 as indicated above.
      Try to use maximal 64-bit support, if available? [y] n  <--- you probably did ot pass -m32
      Try to use maximal 64-bit support, if available? [n]    <--- just press return, all is fine
  • when asked for the default prefix for the installation, give the value of $OPTSTAR as answer (or a base path starting with the value of $OPTSTAR wherever appropriate). Questions include
    • Installation prefix to use? (~name ok) [/usr/local]
  • If the build warn you at first that the directory does not exists but proceed - to questions like "Use that name anyway?" answer Yes


After installing perl itself, you will need to install the STAR required module.

The modules are installed using a bundle script (install_perlmods located in /afs/rhic.bnl.gov/star/common/bin/). It needs some work to get it generalized but the idea is that it contains the dependencies and installation order . To install, you can do the following (we assume install_perlmods is in the path for simplicity and clarity):
 

  1. first chose a work place where you would unpack the needed modules. Let's assume this is /home/xxx/myworkplace
  2. Check things out by running install_perlmods with arguments 0 as follow
    % install_perlmods 0 /home/xxx/myworkplace
    It will tell you the list of modules you need to unpack. If they are already unpacked and /home/xxx/myworkplace contains all needed package directories, skip to step 4.
  3. You can unpack manually OR use the command
    % install_perlmods 1 /home/xxx/myworkplace
    to do this automatically. Note that you could have skipped step 2 and do that from the start (if confident enough).
  4. The steps above should have created a file named /home/xxx/myworkplace/perlm-install-XXX.csh where XXX is the OS you are working on. Note that the same install directory may therefore be used for ALL platform on your cluster. However, versionning is not (yet) supported.
    Execute this script after checking its content. It will run (hopefully smoothly) the perl Makerfile.PL and make / make install commands. Note that you could have also used
    % install_perlmods 2 /home/xxx/myworkplace
    and skip step 2 & 3. In this mode, it unpacks and proceeds with compilation. To do only if you have absolute blind faith in the process (I don't and have written those scripts ;-)   ).

Very old note [this used to happen with older perl version]: if typing make, you get the following message

make: *** No rule to make target `<command-line>', needed by `miniperlmain.o'.  Stop.

then you bumped into an old gcc/perl build issue (tending to come back periodically depending on message formats of gcc) and can resolve this by a using any perl version available and running the commands:

% make depend
% perl -i~ -nle 'print unless /<(built-in|command.line)>/' makefile x2p/makefile

This will suppress from the makefile the offending lines and will get you back on your feet.
 

After you install perl, and your setup is local (in /opt/star) you may want to do the following

% cd /opt/star
% ln -s $STAR_HOST_SYS/* .
%
% # prepare for later directories packages will create
% ln -s $STAR_HOST_SYS/share .
% ln -s $STAR_HOST_SYS/include .
% ln -s $STAR_HOST_SYS/info .
% ln -s $STAR_HOST_SYS/etc .
% ln -s $STAR_HOST_SYS/libexec .
% ln -s $STAR_HOST_SYS/qt .
% ln -s $STAR_HOST_SYS/jed .
%

While some of those directories will not yet exist, this will create a base set of directories (without the additional compiler / OS version) supporting future upgrades via the "default" set of directories. In other words, any future upgrade of compilers for example leading to a different  $STAR_HOST_SYS will still lead as well to a functional environment as far as compatibility exists. Whenever compatibility will be broken, you will need of course to re-create a new $STAR_HOST_SYS tree.
At this stage, you should install as much of the libraries in $OPTSTAR and re-address the perl modules later as some depends on installed libraries necessary for the STAR environment to be functional.

 

Others/ [PLEASE READ, SOME PACKAGE MAY HAVE EXCEPTION NOTES]

        Needed on Other platform (but there on Linux). Unless specified 
        otherwise, the packages were build with the default values.
                make-3.80
                tar-1.13
                flex-2.5.4   
                xpm-3.4k
                libpng-1.0.9

                mysql-3.23.43 on Solaris
                mysql-3.23.55 starting from True64 days (should be tried as
                              an upgraded version of teh API)
                              BEWARE mysql-4.0.17 was tried and is flawed.
                              We also use native distribution MySQL
                mysql-4.1.22  *** IMPORTANT *** Actually this was an upgrade 
                              on SL4.4 (not necessary but the default 4.1.20 
                              has some bugs) 

                <gcc-2.95.2>
                dejagnu-1.4.1	 
                gdb-5.2
                texinfo-4.3
                emacs-20.7 

                findutils-4.1
                fileutils-4.1
                cvs-1.11       -- perl is needed before hand as it folds
                               it in generated scripts
                grep-2.5.1a    Started on Solaris 5.9 in 2005 as ROOT would complain 
                               about too old version of egrep 


This may be needed if not installed on your system. It is part of a needed
autoconf/automake deployment.
                m4-1.4.1		
                autoconf-2.53  
                automake-1.6.3
		
Linux only
                valgrind-2.2.0
valgrind-3.2.3 (was for SL 4.4 until 2009)
valgrind-3.4.1 SL4.4 General/ The installed packages/sources for diverse software layers. The order of installation was ImageMagick-5.4.3-9 On RedHat 8+, not needed for SL/RHE but see below ImageMagick-6.5.3-10 Used on SL5 as default revision is "old" (6.2.8) - TBC slang-1.4.5 On RedHat 8+, ATTENTION: not needed for SL/RHE, install RPM lynx2-8-2 lynx2-8-5 Starting from SL/RHE xv-3.10a-STAR Note the post-fix STAR (includes patch and 32/64 bits support Makefile) nedit-5.2-src ATTENTION: No need on SL/RHE (installed by default) [+] pythia5 pythia6 text2c icalc dejagnu-1.4.1 Optional / Dropped from SL3.0.5
gdb-5.1.91 For RH lower versions - Not RedHat , 8+
gdb-6.2 (patched) Done for SL3 only (do not install on others)
gsl-1.13 Started from SL5 and back ported to SL4 gsl-1.16 Update for SL6 chtext jed-0.99-16 jed-0.99.18 Used from SL5+ jed-0.99.19 Used in SL6/gcc 4.8.2 (no change in instructions) qt-x11-free-3.1.0
qt-x11-free-3.3.1 Starting with SL/RHE
[+] qt-x11-opensource-4.4.3 Deployed from i386_sl4 and i386_sl305 (after dropping SL3.0.2), SL5 qt-everywhere-opensource-src-4.8.5 Deployed from SL6 onward qt-everywhere-opensource-src-4.8.7 Deployed on SL6/gcc 4.8.2 (latest 4.8.x release) doxygen-1.3.5
doxygen-1.3.7 Starting with SL/RHE
doxygen-1.5.9 Use this for SL5+ - this package has a dependence in qt Installed native on SL6 Python 2.7.1 Started from SL4.4 and onward, provides pyROOT support Python 2.7.5 Started from SL6 onward, provides pyROOT support pyparsing V1.5.5 SL5 Note: "python setup.py install" to install pyparsing V1.5.7 SL6 Note: "python setup.py install" to install setuptools 0.6c11 SL5 Note: sh the .egg file to install setuptools 0.9.8 SL6 Note: "python setup.py install" to install MySQL-python-1.2.3 MySQL 14.x client libs compatible virtualenv 1.9 SL6 Note: "python setup.py install" to install Cython-0.24 SL6 Note: "python setup.py build ; python setup.py install" pyflakes / pygments {TODO} libxml2 Was used only for RH8.0, installed as part of SL later [+] libtool-1.5.8 This was used for OS not having libtool, Use latest version.
libtool-2.4 Needed under SL5 64 bits kernel (32 bits code will not assemble otherwise). This was re-packaged with a patch. Coin-3.1.1 Coin 3D and related packages Coin-3.1.3 ... was used for SL6/gcc 4.8.2 + patch (use the package named Coin-3.1.3-star) simage-1.7a SmallChange-1.0.0a SoQt-1.5.0a astyle_1.15.3 Started from SL3/RHE upon user request
astyle_1.19 SL4.4 and above
astyle_1.23 SL5 and above astyle_2.03 SL6 and above unixODBC-2.2.6 (depend on Qt) Was experimental Linux only for now.
unixODBC-2.3.0 SL5+, needed if you intend to use DataManagement tools MyODBC-3.51.06 Was Experimental on Linux at first, ignore this version
MyODBC-3.51.12 Version for SL4.4 (needed for mysql 4.1 and above)
mysql-connector-odbc-3.51.12 <-- Experimental package change - new name starting from 51.12. BEWARE.
mysql-connector-odbc-5.x SL5+. As above, only if you intend to use Data Management tools boost Experimental and introduced in 2010 but not used then boost_1_54_0 SL6+ needed log4cxx 0.9.7 This should be general, initial version
log4cxx 0.10.0 Started at SL5 - this is now from Apache apr-1.3.5 and depend on the Apache Portable Runtime (apr) package apr-util-1.3.7 which need to be installed BEFORE log4cxx and in the order expat-1.95.7 showed valkyrie-1.4.0 Added to SL3 as a GUI companion to valgrind (requires Qt3) Not installed in SL5 for now (Qt4 only) so ignore fastjet-2.4.4 Started from STAR Library version SL11e, essentially for analysis fastjet-3.0.6 SL6 onward unuran-1.8.1 Requested and installed from SL6+ LHAPDF-6.1.6 Added after SL6.4, gcc 4.8.2 In case you have problems emacs-24.3 Installed under SL6 as the default version had font troubles vim-7.4 Update made under SL6.4, please prefer RPM if possible Not necessary (installed anyway) chksum pine4.64 Added at SL4.4 as removed from base install at BNL Retired xemacs-21.5.15 << Linux only -- This was temporary and removed Other directories are WorkStation/ contains packages such as OpenAFS or OpenOffice Linux WebServer/ mostly perl modules needed for our WebServer Linux/ Linux specific utilities (does not fit in General) or packages tested under Linux only. Some notes about packages : Most of them are pretty straight forward to install (like ./configure make ; make install (changing the base path /usr/local to $OPTSTAR). With configure, this is done using either ./configure --prefix=$OPTSTAR ./configure --prefix=$XOPTSTAR Specific notes follows and include packages which are NOT yet official but tried out. - Beware that the Msql-Mysql-modules perl module requires a hack I have not quite understood yet how to make automatic (the advertized --blabla do not seem to work) on platforms supporting the client in OPTSTAR INC = ... -I$(XOPTSTAR)/include/mysql ... H_FILES = $(XOPTSTAR)/include/mysql/mysql.h OTHERLDFLAGS = -L$(XOPTSTAR)/lib/mysql LDLOADLIBS = -lmysqlclient -lm -lz - GD-2+ Do NOT select support for animated GIF. This will fail on standard SL distributions (default gd lib has no support for that).


ImageMagick

Really easy to install (usual configure / make / make install) but however, the PerlMagick part should be installed separatly (the usual perl module way i.e. cd to the directory, perl Makefile.PL and make / make install). I used the distribution's module. Therefore, that perl-module is not in perl/Installed/ as the other perl-modules. The copy of PerlMagick to /bin/ by default will fail so you may want to additionally do

% make install-info
% make install-data-html

which comes later depending on version.
 

lynx

- lynx2-8-2 / lynx2-8-5 
  Note: First, I tried lynx2-8-4 and the make file / configure
        is a real disaster. For 2-8-2/2-8-5, follow the notes 
        below

  General :
  %  ./configure --prefix=$XOPTSTAR {--with-screen=slang}

  Do not forget to
  % make install-help
  % make install-doc

 caveat 1 -- Linux (lynx 2-8-2 only, fixed at 2-8-5)

  $OPTSTAR/lib/lynx.cfg was modified as follow
96,97c96,97
< #HELPFILE:http://www.crl.com/~subir/lynx/lynx_help/lynx_help_main.html
< HELPFILE:file://localhost/opt/star/lib/lynx_help/lynx_help_main.html
---
>
HELPFILE:http://www.crl.com/~subir/lynx/lynx_help/lynx_help_main.html > #HELPFILE:file://localhost/PATH_TO/lynx_help/lynx_help_main.html

   For using curses (needed under Linux, otherwise, the screen looks funny), 
   one has to do a few manipulation by hand i.e. 
   . start with ./configure --prefix=$XOPTSTAR --with-screen=slang
   . edit the makefile and add -DUSE_SLANG to SITE_DEFS
   . change CPPFLAGS from /usr/local/slang to $OPTSTAR/include [when slang is local]
     Version 2-8-5 has this issue fixed.
   . Change LIBS -lslang to -L$OPTSTAR/lib -lslang
   . You are ready now
   There is probably an easier way but as usual, I gave up after close
   to 15mnts reading, as much struggle and complete flop at the end ..

 caveta 2 -- Solaris/True64 : 
   We did not build with slang but native (slang screws colors up)

 

text2c, chksum, chtext, icalc

Those packages can be assembled simply by using the following command:

% make clean && make install PREFIX=$OPTSTAR

To build a 32 bits versions of the executable under a 64 bits kernel, use

  • text2c:             % make CC=`which gcc` CFLAGS="-lfl -m32"
  • icalc:                % make CC=`which gcc` CFLAGS="-lm -m32"
  • chksum:           % make CC=`which gcc` CFLAGS="-m32 -trigraphs"
  • chtext:             % make CC=`which gcc` CFLAGS="-lfl -m32"

 

xv-3.10a

This package distributed already patched and in principle, only a few 'make' commands should suffice. Note

  • xv is licensed so the usage as to remain stricly for your users' amusement only. If you use this package for doing any work, you are violating the law. Please, read the license agreement at http://www.trilon.com/xv/pricing.html 

Normal build

Now,  you should be ready to build the main program (I am not sure why some depencies fail on some platform and did not bother to fix).

% cd tiff/
% make clean && make
% cd ../jpeg
% make clean && make
% cd ..
% rm -f *.o  && make
% make -f Makefile.gcc64 install BINDIR=$OPTSTAR/bin

For 32 bits compilation under a 64 bits kernel

% cd tiff/
% make clean && make CC=`which gcc` COPTS="-O -m32"
% cd ../jpeg
% make clean && make CC=`which gcc` CFLAGS="-O -I. -m32" LDFLAGS="-m32"
% cd ..
% rm -f *.o   && make -f Makefile.gcc32
% make -f Makefile.gcc32 install BINDIR=$OPTSTAR/bin

Makefile.gcc32 and Makefile.gcc64 are both provided for commodity.

Building from scratch (good luck)

However, if you need to re-generate the makefile (may be needed for new architectures), use

% xmkmf 

Then, the patches is as follow

% sed "s|/usr/local|$OPTSTAR|" MakeFile >Makefile.new
% mv Makefile.new Makefile

and in xv.h, line 119 becomes

# if !defined(__NetBSD__) && ! defined(__USE_BSD) 

After xmkmf, you will need to

% make depend

before typing make. This may generate some warnings. Ignore then.

However, I had to fix diverse caveats depending on situations ...

Caveat 1 - no tiff library found

Go into the tiff/ directory and do

% cd tiff  % make -f Makefile.gcc   % cd ..

to generate the mkg3states (itself creating the g3states.h file) as it did not work.

Caveat 2 - tiff and gcc 4.3.2 in tiff/

With gcc 4.3.2 I created an additional .h file named local_types.h and force the definition of a few of the u_* types but using define statements (I know, it is bad). The content of that file is as follows

#ifndef _LOCAL_TYPES_
#define _LOCAL_TYPES_

#if !defined(u_long)
# define u_long unsigned long
#endif
#if !defined(u_char)
# define u_char unsigned char
#endif
#if !defined(u_short)
# define u_short unsigned short
#endif
#if !defined(u_int)
# define u_int unsigned int
#endif

#endif

and it needs to be included in tiff/tif_fax3.h and tiff/tiffiop.h .

Caveat 3 -- no jpeg library?

 In case you have a warning about jpeg such as No rule to make target `libjpeg.a', do the following as well:

% cd jpeg
% ./configure
% make
% cd ..

 

Nedit

There is no install provided. I did

% make linux
% cp source/nc source/nedit $OPTSTAR/bin/
% cp doc/nc.man $OPTSTAR/man/man1/nc.1
% cp doc/nedit.man $OPTSTAR/man/man1/nedit.1

Other targets

% make dec
% make solaris

If you need to build for another compiler or another platform, you may want to copy one of the provided makefile and modify them to create a new target. For example, if you have a 64 bits kernel but want to build a 32 bits nedit (for consistency or otherwise), you could do this:

% cp makefiles/Makefile.linux makefiles/Makefile.linux32

then edit and add -m32 to bothe CFLAGS and LIBS. This will add a target "platform" linux32 for a make linux32 command (tested this and worked fine). The STAR provided package added (in case) both a linux64 and a linux32 reshaped makefile to ensure easy install for all kernels (gcc compiler should be recent and accept the -m flag).

 

Pythia libraries

The unpacking is "raw". So, go in a working directory where the .tar.gz are, and do the following (for linux)

% test -d Pythia && rm -fr Pythia ; mkdir Pythia && cd Pythia && tar -xzf ../pythia5.tar.gz 
% ./makePythia.linux 
% mv libPythia.so $OPTSTAR/lib/ 
% cd .. 
% 
% test -d Pythia6 && rm -fr Pythia6 ; mkdir Pythia6 && cd Pythia6 && tar -xzf ../pythia6.tar.gz 
% test -e main.c && rm -f main.c 
% ./makePythia6.linux 
% mv libPythia6.so $OPTSTAR/lib 
% 

Substitute linux with solaris for Solaris platform. On Solaris, Pythia6 requires gcc to build/link.

On SL5, 64 bits note

Depending on whether you compile a native 64 bit library support or a cross-platform 32/64, you will need to handle it differently.

For a 64 bits platform, I had to edit the makePythia.linux and  -fPIC to the options for a so the binaries main.c . I did not provide a patched package mainly because v5 is not really needed in STAR. For pythia6 caveat: On SL5, 64 bits, use makePythia6.linuxx8664 file. You will need to chmod +x first as it was not executable in my version.

On 64 bit platform to actually build a cross-platform version, I had instead to use the normal build but make sure to add -m32 to compilation and linker options and -fPIC to compilation option.

 

True64

% chmod +x ./makePythia.alpha && ./makePythia.alpha Pythia6
% chmod +x ./makePythia6.alpha && ./makePythia6.alpha 

The following script was used to split the code which was too big

 #!/usr/bin/env perl
 $filin = $ARGV[0];
 open(FI,$filin);
 $idx = $i = 0;
 while( defined($line = <FI>) ){
    chomp($line); $i++;

    if ($i >= 500 && $line =~ /subroutine/){
	$i = 0;
	$idx++;
    }

    if ($i == 0){
	close(FO);
	open(FO,">$idx.$filin");
	print "Opening $idx.$filin\n";
    }
    print FO "$line\n";
 }
 close(FO);
 close(FI);

 

Qt 4

Starts the same than Qt3 i.e. assuming that SRC=/afs/rhic.bnl.gov/star/common/General/Sources/ and $x and $y stands for major and minor versions of Qt. There are multiple flavors of the package name (it was called qt-x11-free* then qt-x11-opensource* and with more recent package qt-everywhere-opensource-src*). For the sake of instructions, I provide a generic example with the most recent naming (please adapt as your case requires). WHEREVER is a location of your choice (not the final target directory).

% cd $WHEREVER
% tar -xzf $SRC/qt-everywhere-opensource-src-4.$x.$y.tar.gz
% cd qt-everywhere-opensource-src-4.$x.$y
% ./configure --prefix=$XOPTSTAR/qt4.$x -qt-sql-mysql -no-exceptions -no-glib -no-rpath 

To build a 32/64 bits version on a 64 bits OS or forcing a 32 bits exec (shared mode) on a 32 bits OS, use a configure target like the below

% ./configure  -platform linux-g++-32 -mysql_config $OPTSTAR/bin/mysql_config [...] 
% ./configure  -platform linux-g++-64 [...]

Note that the above assumes you have a proper $OPTSTAR/bin/mysql_config. ON a mixed 64/32 bits node, the default in /usr/bin/mysql_config will return the linked library as the /usr/lib64/mysql path and not the /usr/lib/mysql and hence, Qt make will fail finding the dependencies necessary to link with -m32. The trick we had was to copy mysql_config and replace lib64 by lib and voila!.


Compiling

  % make
  % make install

  % cd $OPTSTAR
  % ln -s qt4.$x ./qt4

For compiling with a different compiler, note that the variables referenced in this section will be respected by configure. You HAVE TO do this as the project files and other configuration files from Qt will include information on the compiler (inconsistency may arrise otherwise).

Misc notes

  • If you use the same directory tree for compiling the 64 bits and the 32 bits version, please note that 'make clean' will not do the proper job. You will need to use a more systematic % find . -name '*.o' -exec rm -f {} \;  command before running  ./configure again.
  • We added mysql support in Qt4 and Qt can now be compiled in a separate directory and installed properly (at last!). If the mysql support gives you trouble on a 64 bit OS attempting to build a 32 bit image, be sure you have used as indicated above the -mysql_config /usr/lib/mysql/mysql_config option as otherwise, the default mysql_config will be picked from /usr/bin and that version will refer to the 64 bits  libraries (the link will then fail). 
  • For SL44, we created a qt3 distribution as then, both 3 and 4 existed. Otherwise, the ./qt4 link as indicated above is sufficient for SL5 and above.
  • On some systems (SL3.0.2 for sure), I also used
    • -no-openssl   as there were include problems with ssl.h and krb5.h
    • -qt-libtiff   as the default system included header did not agree with Qt code
    • -platform linux-icc  could be used for icc based compiler

 

Qt 3

Horribly packaged, the easiest is to unpack in $OPTSTAR, cd to qt-x11-free-3.X.X (where X.X stands for the current sub-version deployed on our node), run the configure script, make the package, then make clean. Then, link

  % cd $OPTSTAR && ln -s qt-x11-free-3.X.X qt

Later release can be build that way with changing the soft-link without removing the preceeding version entirely. Before building, do the following (if you had previous version of Qt installed). This is not necessary if you install the package the first time around. Please, close windows after compilation to ensure STAR path sanity.

  % cd $OPTSTAR/qt
% setenv QTDIR `pwd`
% setenv LD_LIBRARY_PATH `pwd`/lib:$LD_LIBRARY_PATH
% setenv PATH `pwd`/bin:$PATH

To configure the package, then use one of:

  • Linux gcc: ./configure --prefix=`pwd` -no-xft -thread
  • Linux icc:  ./configure --prefix=`pwd` -no-xft -thread -platform linux-icc
  • True64 :   ./configure --prefix=`pwd` -no-xft -thread
  • Solaris:     ./configure --prefix=`pwd` -no-xft

In case of thread, the regular version is build first then the threaded version (so far, they have different names and no Soft links).

You may also want to edit  $QTDIR/mkspecs/default/qmake.conf and replace the line

QMAKE_RPATH		= -Wl,-rpath,

by

QMAKE_RPATH		= 

By doing so, you would disable the rpath shared library loading and rely on LD_LIBRARY_PATH only for loading your Qt related libraries. This has the advantages that you may copy the Qt3 libraries along your project and isolate onto a specific machine without the need to see the original installation directory.

 

unixODBC

% ./configure --prefix=$XOPTSTAR [CC=icc CXX=icc]
% make clean       # in case you are re-using the same directory for multiple platform 
% make 
% make install

Use the environment variables noted in this section and all will go well.

Note on versions earlier than 2.3.0 (including 2.2.14 previously suggested)

The problem desribed below DOES NOT exist if you use 32 bits kernel OS and is specific to 64 bits kernel with 32 bits support.

For a 32 bits compilation under a 64 bits kernel, please use % cp -f $OPTSTAR/bin/libtool .  after the ./configure and before the make (see this section for an explaination of why). unixODBC versions 2.3.0 does not have this problem.



MyODBC

Older version

Came with sources and one could compile "easily" (and register manually).

- MyODBC
Linux % ./configure --prefix=$XOPTSTAR --with-unixODBC=$XOPTSTAR [CC=icc CXX=icc]
Others % ./configure --prefix=$XOPTSTAR --with-unixODBC=$XOPTSTAR --with-mysql-libs=$XOPTSTAR/lib/mysql
--with-mysql-includes=$XOPTSTAR/include/mysql --with-mysql-path=$XOPTSTAR

Note : Because of an unknown issue, I had to use --disable-gui on True64
as it would complain about not finding the X include ... GUI is
not important for ODBC client anyway but whenever time allows ...

Deploy instructions at
http://www.mysql.com/products/myodbc/faq_toc.html

Version 5.x of the connector

Get the proper package, currently named mysql-connector-odbc-5.x.y-linux-glibc2.3-x86-32bit or  mysql-connector-odbc-5.x.y-linux-glibc2.3-x86-64bit. the package are available from the MySQL Web site. The install will need to be manual i.e.

% cp -p bin/myodbc-installer $OPTSTAR/bin/
% cp -p lib/*  $OPTSTAR/lib/
% rehash

To register the driver, use the folowing command

% myodbc-installer -d -a -n "MySQL ODBC 5.1 Driver" -t "DRIVER=$OPTSTAR/lib/libmyodbc5.so;SETUP=$OPTSTAR/lib/libmyodbc3S.so"
% myodbc-installer -d -a -n "MySQL" -t "DRIVER=$OPTSTAR/lib/libmyodbc5.so;SETUP=$OPTSTAR/lib/libmyodbc3S.so"

this will add a few lines in $OPTSTAR/etc/odbcinst.ini . The  myodbc-installer -d -l does not seem to be listing what you installed though (but the proper lines will be added to the configuration).

 

doxygen

Installation would benefit from some smoothing + note the space between the --prefix and OPTSTAR (non standard option set for configure).

Use one of

% ./configure --prefix $OPTSTAR                       # for general compilation
% ./configure --platform linux-32 --prefix $OPTSTAR   # Linux, gcc 32 bits - this option was added in the STAR package
% ./configure --platform linux-64 --prefix $OPTSTAR   # Linux, gcc 64 bits - this option was fixed in the STAR package

then

% make
% make install

as usual but also

% make docs

which will fail d ue to missing eps2pdf program. Will create however the HTML files you will need to copy somewhere.

% cp -r html $WhereverTheyShouldGo

and as example

% cp -r html /afs/rhic.bnl.gov/star/doc/www/comp/sofi/doxygen


Note: The linux-32 and linux-64 platform were packaged in the archive provided for STAR (linux-32 does not exists in the original doxygen distribution while linux-64 is not consistent with -m64 compilation option).

 

Additional Graphics libraries

Starting from SL5, we also deployed the following: coin, simage, SmallChange, SoQt. Those needs to be installed before Qt4 but after doxygen. All options are specified below to install those packages. Please, substitute -m32 by -m64 for a 64 bits native support. After the configure, the usual make and make install is expected.

The problem desribed below DOES NOT exist if you use 32 bits kernel OS and is specific to 64 bits kernel with 32 bits support.

For the 32 bits version compilation under a 64 bits kernel and for ALL sub-packages below, please be sure you have the STAR version of libtool installed and use the command
% cp -f $OPTSTAR/bin/libtool .
after the ./configure to replace the generated local libtool script. This will correct a link problem which will occur at link time (see the libtool help for more information).

 

Coin:

% ./configure --enable-debug --disable-dependency-tracking --enable-optimization=yes \
--prefix=$XOPTSTAR CFLAGS="-m32 -fPIC -fpermissive" CXXFLAGS="-m32 -fPIC -fpermissive" LDFLAGS="-m32 -L/usr/lib" \
--x-libraries=/usr/lib

or, for or the 64 bits version

% ./configure --enable-debug --disable-dependency-tracking --enable-optimization=yes \
--prefix=$XOPTSTAR CFLAGS="-m64 -fPIC -fpermissive" CXXFLAGS="-m64 -fPIC -fpermissive" LDFLAGS="-m64" 

 

simage (needs Qt installed and QTDIR defined prior):

% ./configure --prefix=$XOPTSTAR --enable-threadsafe --enable-debug --disable-dependency-tracking \ 
--enable-optimization=yes --enable-qimage CFLAGS="-m32 -fPIC" CXXFLAGS="-m32 -fPIC" \ 
LDFLAGS="-m32" FFLAGS="-m32 -fPIC" --x-libraries=/usr/lib

or, for the 64 bits version

% ./configure --prefix=$XOPTSTAR --enable-threadsafe --enable-debug --disable-dependency-tracking \ 
--enable-optimization=yes --enable-qimage CFLAGS="-m64 -fPIC" CXXFLAGS="-m64 -fPIC" \ 
LDFLAGS="-m64" FFLAGS="-m64 -fPIC" 

SmallChange:

% ./configure --prefix=$XOPTSTAR --enable-threadsafe --enable-debug --disable-dependency-tracking \
--enable-optimization=yes CFLAGS="-m32 -fPIC -fpermissive" CXXFLAGS="-m32 -fPIC -fpermissive" \
LDFLAGS="-m32" FFLAGS="-m32 -fPIC"

or, for the 64 bits version

% ./configure --prefix=$XOPTSTAR --enable-threadsafe --enable-debug --disable-dependency-tracking \
--enable-optimization=yes CFLAGS="-m64 -fPIC -fpermissive" CXXFLAGS="-m64 -fPIC -fpermissive" \
LDFLAGS="-m64" FFLAGS="-m64 -fPIC"

SoQt:

./configure --prefix=$XOPTSTAR --enable-threadsafe --enable-debug --disable-dependency-tracking \ 
--enable-optimization=yes --with-qt=true --with-coin CFLAGS="-m32 -fPIC  -fpermissive" CXXFLAGS="-m32 -fPIC  -fpermissive" \
LDFLAGS="-m32" FFLAGS="-m32 -fPIC"

or, for the 64 bits version

./configure --prefix=$XOPTSTAR --enable-threadsafe --enable-debug --disable-dependency-tracking \ 
--enable-optimization=yes --with-qt=true --with-coin CFLAGS="-m64 -fPIC  -fpermissive" CXXFLAGS="-m64 -fPIC -fpermissive" \
LDFLAGS="-m64" FFLAGS="-m64 -fPIC"

 

 

flex

Flex is usually not needed but some OS have pre-GNU flex not adequate so I would recommend to deploy flex-2.5.4  anyway (the latest version since Linux 2001). Do not install under Linux if you have flex already on your system as rpm.

Attention: Under SL5 64 bits, be sure you have flex32libs-2.5.4a-41.fc6 installed as documented on Scientific Linux 5.3 from 4.4. Linkage of 32 bits executable would otherwise dramatically fail.

 

- Xpm (Solaris)
  % xmkmf
  % make Makefiles
  % make includes
  % make 
  I ran the install command by hand changing the path (cut and paste)
  Had to 


  % cd lib
  % installbsd -c -m 0644 libXpm.so $OPTSTAR/lib
  % installbsd -c -m 0644 libXpm.a $OPTSTAR/lib
  % cd ..
  % cd sxpm/
  % installbsd -c sxpm $OPTSTAR/bin
  % cd ../cxpm/
  % installbsd -c cxpm $OPTSTAR/bin
  %
  
  Onsolaris, the .a was not there, add to
  % cd lib && ar -q libXpm.a *.o && cp libXpm.a $OPTSTAR/lib
  % cd ..

  Additionally needed
  % if ( ! -e $OPTSTAR/include) mkdir $OPTSTAR/include 
  % cp lib/xpm.h $OPTSTAR/include/

  

- libpng 
  ** Solaris **
  % cat scripts/makefile.solaris | sed "s/-Wall //" > scripts/makefile.solaris2
  % cat scripts/makefile.solaris2 | sed "s/gcc/cc/" > scripts/makefile.solaris3
  % cat scripts/makefile.solaris3 | sed "s/-O3/-O/" > scripts/makefile.solaris2
  % cat scripts/makefile.solaris2 | sed "s/-fPIC/-KPIC/" > scripts/makefile.solaris3
  % 
  % make -f scripts/makefile.solaris3

  will eventually fail related to libucb. No worry, this can be sorted
  out (http://www.unixguide.net/sun/solaris2faq.shtml) by including
  /usr/ucblib in the -L list
  % cc -o pngtest -I/usr/local/include -O pngtest.o -L. -R. -L/usr/local/lib \
    -L/usr/ucblib -R/usr/local/lib -lpng -lz -lm
  % make -f scripts/makefile.solaris3 install prefix=$OPTSTAR


  ** True64 **
  Copy the make file but most likely, a change like
ZLIBINC = $(OPTSTAR)/include
ZLIBLIB = $(OPTSTAR)/lib

  in the makefile is neeed.
 
  pngconf.h and png.h needed for installation and either .a or .a + .so

cp pngconf.h png.h $OPTSTAR/include/
cp libpng.* $OPTSTAR/lib



- mysql client (Solaris)
 % ./configure --prefix=$XOPTSTAR --without-server {--enable-thread-safe-client}
 (very smooth)
 The latest option is needed to create the libmysqlclient_r needed by some
 applications. While this so is build with early version of MySQL, version
 4.1+ requires the configure option explicitly.


- dejagnu-1.4.1	[Solaris specific]
the install program was not found.
% cd doc/ && cp ./runtest.1 $OPTSTAR/man/man1/runtest.1
% chmod 644 $OPTSTAR/man/man1/runtest.1

Jed

The basic principles is as usual

% ./configure --prefix=$OPTSTAR
% make
% make xjed
% make install

However, on some platform (but this was not seen as a problem on SL/RHE), you may need to apply the following tweak before typing make. Edit the configure script and add $OPTSTAR (possibly /opt/star) to it as follow.

JD_Search_Dirs="$JD_Search_Dirs \
                $includedir,$libdir \
                /opt/star/include,/opt/star/lib \
                /usr/local/include,/usr/local/lib \
                /usr/include,/usr/lib \
                /usr/include/slang,/usr/lib \
                /usr/include/slang,/usr/lib/slang" 

32 / 64 bit issue?

The problem desribed below DOES NOT exist if you use 32 bits kernel OS and is specific to 64 bits kernel with 32 bits support.

The variables described here will make configure pick up the right comiler and compiler options. On our initial system, the 32 bits compilation under the 64 bits kernel Makefile tried to do something along the line of -L/usr/X11R6/lib64 -lX11 but did not find X11 libs (since the path is not adequate). To correct for this problem, edit src/Makefile and replace XLIBDIR = -L/usr/lib64 by XLIBDIR = -L/usr/lib . You MUST have the 32 bits compatibility libraries installed on your 64 bits kernel for this to work.

AIX

I had to make some hack on AIX (well, who wants to run on AIX in the first place right ?? but since AIX do not have any emacs etc ... jed is great) as follow

  • make a copy of unistd.h and comment the sleep() prototype
  • modify file.c to include the local version (replace <> by "")
  • modify main.c to include sys/io.h (and not io.h) and comment out direct.h

Voila (works like a charm, don't ask).

 
emacs

Version 24.3

In the below options, I recommend the with-x-toolkit=motif as the default GTK will lead to many warnings depending on the user's X11 server version and supported features. Motif may create an "old look and feel" but will work. However, you may have a local fix for GTK (by installing all required modules and dependencies) and not need to go to the Motif UI.

% ./configure --with-x-toolkit=motif --prefix=$OPTSTAR

For the 32 bits version supporting 64/32 bits, use the below

% ./configure --with-crt-dir=/usr/lib --with-x-toolkit=motif --prefix=$OPTSTAR CFLAGS="-m32 -fPIC" CXXFLAGS="-m32 -fPIC" LDFLAGS="-m32"

Then the usual 'make' and 'make install'.

Below are old instructions you should ignore

- emacs
  Was repacked with leim package (instead of keeping both separatly)
  in addition of having a patch in src/s/sol2.h for solaris as follow
 #define HAVE_VFORK 1
 #endif
 
+/* Newer versions of Solaris have bcopy etc. as functions, with
+   prototypes in strings.h.  They lose if the defines from usg5-4.h
+   are visible, which happens when X headers are included.  */
+#ifdef HAVE_BCOPY
+#undef bcopy
+#undef bzero
+#undef bcmp
+#ifndef NOT_C_CODE
+#include <strings.h>
+#endif
+#endif
+

  Nothing to do differently here. This is just a note to keep track
  of changes found from community mailing lists.

  % ./configure --prefix=$OPTSTAR --without-gcc
  

- Xemacs (Solaris)
  % ./configure --without-gcc --prefix=$OPTSTAR
  Other solution, forcing Xpm 
  % ./configure --without-gcc --prefix=$OPTSTAR --with-xpm --site-prefixes=$OPTSTAR

  Possible code problem :
  /* #include <X11/xpm.h> */
  #include <xpm.h> 

- gcc-2.95 On Solaris was used as a base compiler
  % ./configure --prefix=$OPTSTAR
  % make bootstrap

  o Additional gcc on Linux
  Had to do it in multiple passes (you do not need to do the first pass
  elsewhere ; this is just because we started without a valid node).

  A gcc version < 2.95.2 had to be used. I used a 6.1 node to assemble
  it and install in a specific AFS tree (cross version)
  % cd /opt/star && ln -s /afs/rhic/i386_linux24/opt/star/alt .
  Move to the gcc source directory
  % ./configure --prefix=/opt/star/alt
  % make bootstrap
  % make install
  install may fail in AFS land. Edit gcc/Makefile and remove "p" option
  to the tar options TAROUTOPTS .
  
  For it work under 7.2, go on a 7.2 node and
  % cp /opt/star/alt/include/g++-3/streambuf.h /opt/star/alt/include/g++-3/streambuf.h-init
  % cp -f /usr/include/g++-3/streambuf.h /opt/star/alt/include/g++-3/streambuf.h
  ... don't ask ...


  o On Solaris, no problems
  % ./configure --prefix=/opt/star/alt
  etc ...

- Compress-Zlib-1.12 --> zlib-1.1.4
  If installed in $OPTSTAR,
  % setenv ZLIB_LIB $OPTSTAR/lib
  % setenv ZLIB_INCLUDE $OPTSTAR/include
  

- findutil
  Needed a patch in lib/fnmatch.h for True64
  as follow :
  + addition of defined(__GNUC__) on line 49
  + do a chmod +rw lib/fnmatch.h  first

#if !defined (_POSIX_C_SOURCE) || _POSIX_C_SOURCE < 2 || defined (_GNU_SOURCE) || defined(__GNUC__)






* CLHEP1.8                      *** Experimental only ***
printVersion.cc needs a correction #include <string> to <string.h>
for True64 which is a bit strict in terms of compilation.

On Solaris, 2 caveats
o gcc was used (claim that CC is used but do not have the include)
o install failed running a "kdir" command instead of mkdir so do a
% make install MKDIR='mkdir -p'

Using icc was not tried and this package when then removed.
- mysqlcc ./configure --prefix=$OPTSTAR --with-mysql-include=$OPTSTAR/include/mysql --with-mysql-lib=$OPTSTAR/lib/mysql The excutable do not install itself so, one needs to % cp -f mysqlcc $OPTSTAR/bin/

 

libtool

First, please note that the package distributed for STAR contains a patch for support of the 32 / 64 bits environment. If you intend to download from the original site, please apply the patch below as indicated. If you do not use our distributed package and attempt to assemble a 32 bits library under a 64 bits kernel, we found cases where the default libtool will fail.

Why the replacement of libtool? Sometimes, "a" version of libtool is added along software packages indicated in this help. However, those do not consider the 32 bits / 64 bits mix and often, their use lead to the wrong linkage (typical problem is that a 32 bits executable or shared library is linked against the 64 bits stdc++ versions, creating a clash).

This problem does not existswhen you assemble a 64 bits code under a 64 bits kernel or assemble a 32 bits codes under a 32 bits kernel.

In all cases, to compile and assemble, use a command line like the below:

% ./configure --prefix=$XOPTSTAR CFLAGS="-m32 -fPIC" CXXFLAGS="-m32 -fPIC" \
FFLAGS="-m32 -fPIC" FCFLAGS="-m32 -fPIC" LDFLAGS="-m32"                        # 32 bits version
% ./configure --prefix=$XOPTSTAR CFLAGS="-m64 -fPIC" CXXFLAGS="-m64 -fPIC" \
FFLAGS="-m64 -fPIC" FCFLAGS="-m64 -fPIC" LDFLAGS="-m64"                        # 64 bits version
% make
% make install

Patches

libtool 2.4

The file ./libltdl/config/ltmain.sh needs the following patch

< 
< 	# JL patch 2010 -->
< 	if [ -z "$m32test" ]; then
< 	    #echo "Defining m32test"
< 	    m32test=$($ECHO "${LTCFLAGS}" | $GREP m32)
<    fi	
< 	if [ "$m32test" != "" ] ; then
< 	  dependency_libs=`$ECHO " $dependency_libs" | $SED 's% \([^ $]*\).ltframework% -framework \1%g' | $SED 's|lib64|lib|g'`
< 	else
< 	  dependency_libs=`$ECHO " $dependency_libs" | $SED 's% \([^ $]*\).ltframework% -framework \1%g'`
< 	fi
< 	# <-- end JL patch
< 
---
> 	dependency_libs=`$ECHO " $dependency_libs" | $SED 's% \([^ $]*\).ltframework% -framework \1%g'`

 

 

gdb (patch)

in gdb/linux-nat.c

         /*
fprintf_filtered (gdb_stdout,
"Detaching after fork from child process %d.\n",
child_pid);
*/

and go (no, I will not explain).

 

astyle

Version 2.03

% cd astyle_2.03/src
% make -f ../build/gcc/Makefile CXX="g++ -m32 -fPIC"   # for the 64 bits version, use the same command
                                                       # with   CXX="g++ -m64 -fPIC" 
% cp bin/astyle $XOPTSTAR/bin/
% test -d $XOPTSTAR/share/doc/astyle || mkdir -p $XOPTSTAR/share/doc/astyle
% cp ../doc/*.* $XOPTSTAR/share/doc/astyle

The target

% make -f ../build/gcc/Makefile clean

also works fine and is needed between versions.


Version 1.23

Directory structure changes but easier to make the package so use instead

% cd astyle_1.23/src/
% make -f ../buildgcc/Makefile  CXX="$CXX $CFLAGS"
% cp ../bin/astyle $OPTSTAR/bin/
% cd .. 

Note that the compressed command above assumes you have define dthe envrionment variables as described in this section. Between OS (32 / 64 bits) you may need to % rm -f obj/* as the make system will not reocgnize the change between kernels (you alternately may make -f ../buildgcc/Makefile clean but a rm will be faster :-) ).

Documentation

A crummy man page was added (will make it better later if really important). It was generted as follow and provided for convenience in the packages for STAR (do not overwrite because I will not tell you what to do to make the file a good pod):

% cd doc/
% lynx -dump astyle.html >astyle.pod 

[... some massage beyond the scope of this help - use what I provided ...]

% pod2man astyle.pod >astyle.man 
% cp astyle.man $OPTSTAR/man/man1/astyle.1 

 

Versions < 1.23

Find where the code really unpacks. There are no configure for this package.

% cd astyle_1.15.3 ! or 
% cd astyle/src
% make
% cp astyle $OPTSTAR/bin/

Version 1.15.3

The package comes as a zip archive. Be aware that unpacking extracts files in the current directory. So, the package was remade for convenience. Although written in C++, this executable will perform as expected under icc environment. On SL4 and for versions, gcc 3.4.3, add -fpermissive to the Makefile CPPFLAGS.

 

valgrind

MUST be installed using $XOPTSTAR because there is an explicit reference to the install path. Copying to a local /opt/star would therefore not work. For icc, use the regular command as this is a self-contained program without C++ crap and can be copied from gcc/icc directory. The command is

% ./configure --prefix=$XOPTSTAR  

Note: valgrind version >= 3.4 may ignore additional compiler options (but will respect the CC and CXX variables) as it will assemble both 32 bits and 64 bits version on a dual architecture platform. You could force a 32 build only by adding the command line options --enable-only32bit.

Caveats for earlier revisions below:

Version 2.2

A few hacks were made on the package, a go-and-learn experience as problems appeared
 

coregrind/vg_include.h
123c123
< #define VG_N_RWLOCKS 5000
---
> #define VG_N_RWLOCKS 500
coregrind/vg_libpthread.vs
195a196
> __pthread_clock_gettime; __pthread_clock_settime;

to solve problems encountered with large programs and pthread.

 

APR

The problem desribed below DOES NOT exist if you use 32 bits kernel OS and is specific to 64 bits kernel with 32 bits support.

For a 32 bits compilation under a 64 bits kernel, please use % cp -f $OPTSTAR/bin/libtool .  after the ./configure and before the make (see this section for an explaination of why) for both the apr and expat packages.

apr is an (almost) straight forward installation:                

% ./configure --prefix=$OPTSTAR

apr-util needs to have one more argument i.e. 

% ./configure --prefix=$OPTSTAR --with-apr=$OPTSTAR

The configure script will respect the environment variables described in this section and, provided you have defined them properly for the intended target (32 or 64 bits executable), the resulting Mkaefile will be properly generated without further modifications needed.

Note however that the package distributed in STAR has one hack to the fconfigure script a follows (apply if you download from anoher source than STAR's distributed packages):

% diff configure.orig configure
4255c4255,4256
<     CFLAGS="-g -O2"
---
>     # CFLAGS="-g -O2"
>     CFLAGS="-g -m32"
4261c4262,4263
<     CFLAGS="-O2"
---
>     # CFLAGS="-O2"
>     CFLAGS="-m32"

This will allow another way to assemble the package (without having to define the env variables) but you will need to substitute -m32 by -m64 as appropriate.

 

expat package is similar

% ./configure --prefix=$OPTSTAR

 

log4cxx

log4cxx 0.10.x

This distribution is part of the Apache project and requires APR library (see above).

The package was taken nearly as-is apart from the following patches:

  • inputstreamreader.cpp, socketoutputstream.cpp, console.cxx    - requires adding #include <string.h> on top as memmove, mencpy are no longer implicit but declared in string.h

After installing APR and using the patches as indicated, use

% ./configure --prefix=$XOPTSTAR --with-apr=$XOPTSTAR CFLAGS="-m32 -fPIC" CXXFLAGS="-fno-inline -g -m32" LDFLAGS="-m32"
or
% ./configure --prefix=$XOPTSTAR --with-apr=$XOPTSTAR CFLAGS="-m64 -fPIC" CXXFLAGS="-fno-inline -g -m64" LDFLAGS="-m64"

% cp -f $OPTSTAR/bin/libtool . 
% make
% make install

Please do NOT forget to use % cp -f $OPTSTAR/bin/libtool .  after the ./configure and before the make (see this section for an explaination of why). This assummes you installed libtool as instructed.

Finally, there is one patch needed if you download the package from other sources than where STAR provides the packages. The patch relates to a problem with atomic operations handling.

Index: src/main/cpp/objectimpl.cpp
===================================================================
--- src/main/cpp/objectimpl.cpp (revision 654826)
+++ src/main/cpp/objectimpl.cpp (working copy)
@@ -36,12 +36,12 @@

void ObjectImpl::addRef() const
{
- apr_atomic_inc32( & ref );
+ ref++;
}

void ObjectImpl::releaseRef() const
{
- if ( apr_atomic_dec32( & ref ) == 0 )
+ if ( --ref == 0 )
{
delete this;
}

 

log4cxx 0.9.5

There is a bug on Linux so, start with commenting all lines related to HAVE_LINUX_ATOMIC_OPERATIONS in configure.in before the below. Finally, two code had to be patched are now repacked

  • filewatchdog.cpp line 21: comment #ifdef WIN32 and the associated #endif
  • cocketimpl.cpp line 41: add #include <errno.h>

For ODBC support, one needs

% setenv CPPFLAGS "-I$XOPTSTAR/include"
% setenv LDFLAGS  "-L$XOPTSTAR/lib"

log4cxx 0.9.7

Also need to do the below or it will not even find the libs at configure.

% setenv CPPFLAGS "-I$XOPTSTAR/include"
% setenv LDFLAGS  "-L$XOPTSTAR/lib"


On Scientific Linux 4.4 aka Linux 2.6 replace as follows

#AC_CHECK_FUNCS(gettimeofday ftime setenv)
AC_CHECK_FUNCS(setenv)

Linux 7.3 distributions note

On Version 7.3 of Linux, this is hard to install. You will need to upgrade m4, autoconf to at the least the versions specified for "other platforms".  It won't compile easily with gcc 2.96 though. But it can using

% ./configure --prefix=$OPTSTAR CC=/usr/bin/gcc3 CXX=/usr/bin/g++3

if you have all gcc 3 ready.

Finally, if you install log4cxx from a new Linux version (especially one having a different version of autoconf tools), you better start from a fresh directory and not attempt to use the 'clean' target (it will fail).

Summay then:

  Linux gcc  (general instructions, all log4cxx) 

% ./autogen.sh 
% ./configure --prefix=$OPTSTAR [--with-ODBC=unixODBC]

  Linux icc

% setup icc 
% ./configure --prefix=$XOPTSTAR CC=icc CXX=icc [--with-ODBC=unixODBC]

   If icc is the second target, you should use 'make clean' before the configure.

  Solaris
 
   Does not configure (need the autoconf tools)

  True64
   Not tried yet

 


libxml2


Platform: so far needed to update RH 8.0 only, add to propagate to other platform in 2006 due to a component dependence issue.

% ./configure --without-python --prefix=$XOPTSTAR


pine


Re-added at BNL since SL4.4 because it was removed from the base installation, this may not be needed for your site (install from RPM, it exists).

Scientific Linux (don't get fooled by the targets)

% ./build lrh 
% cp bin/pine bin/pico $OPTSTAR/bin/

On a mixed of 32 / 64 bits architecture and/or with alternate  gcc versions, the command examples below could be used:

% ./build lrh CC=`which gcc` SSLDIR=/usr/include/openssl/ EXTRALDFLAGS="-m32" EXTRACFLAGS="-m32 -fPIC"
[or]
% ./build lrh CC=`which gcc` SSLDIR=/usr/include/openssl/ EXTRALDFLAGS="-m64" EXTRACFLAGS="-m64 -fPIC"

 

GSL - GNU Scientific Library

The install is straight forward with the usual configure but on 64 bit machine, you will need to add the CCFLAGS and LDFLAGS as showed below

% ./configure --prefix=$OPTSTAR                                    ! default bits 
% ./configure --prefix=$OPTSTAR CFLAGS="-m32 -fPIC" LDFLAGS="-m32" ! 32 bits
% ./configure --prefix=$OPTSTAR CFLAGS="-m64 -fPIC" LDFLAGS="-m64" ! 64 bits
% make
% make install

Python

The below were fine with verison 2.7.1

% setenv BASECFLAGS "-m32 -fPIC"
% setenv CXXFLAGS "-m32 -fPIC"
% setenv LDFLAGS "-m32"
%  ./configure --prefix=$XOPTSTAR

On a mixed architecture, I had to modify the generated pyconfig.h as the use of VA_LIST_IS_ARRAY would get pythong to crash.

1069c1069
< //#define VA_LIST_IS_ARRAY 1
---
> #define VA_LIST_IS_ARRAY 1



For the 64 bits version, please substitute -m32 with -m64  as follows

% setenv BASECFLAGS "-m64 -fPIC"
% setenv CXXFLAGS "-m64 -fPIC"
% setenv LDFLAGS "-m64"
%  ./configure --prefix=$XOPTSTAR

Note: The default compilation (without using the environment variable setting) may succeed but binding with ROOT and other package will fail and require -fPIC and additionally, it is best to have in all configurations -m32/-m64 specified explicitely.

 

MySQL-python

Build is straight forward in principle i.e.

% python setup.py build 
% python setup.py install 

but

  • on a mixed 32/64 bits platform, the 32 bits will need to be persuaded by
    • editing site.cfg and setting mysql_config to the 32 bit version (likely /usr/lib/mysql/mysql_config ) - you can be sure of this as the return value for --cflags would have -m32
    • Add the following code in setup_posix.py
      65,68d64
      <     for i in range(len(extra_compile_args)):
      <         if extra_compile_args[i] == '-m32':
      <             extra_link_args += ['-m32']
      <
      
      I am sure there are other more ellegant ways but this works fine.
       
  • If your default MySQL distribution resides in $OPTSTAR, you will also need to edit the site.cfg script before running the setup script and modify the mysql_config variable accordingly. This will prevent running into symnbol loading conflicts later (usually happening because a third party product would, in this case, use one version of MySQL and STAR code another). 

 

fastjet

The basic compilation requires

./configure --prefix=$OPTSTAR CXXFLAGS="-m32 -fPIC -fno-inline" CFLAGS="-m32 -fPIC -fno-inline" LDFLAGS="-m32"

For the 64 bits, replace -m32 by -m64. -fno-inline is needed still to circuvnet a gcc bug with inlining.

 

boost

Certainly, the most helpful reference was this boost reference. But those are not immediate instructions. Here is what you will need to do:

% ./bootstrap.sh --prefix=$XOPTSTAR

In any cases, this will build a few 64 bits executables on a 32/64 bits machine but don't panic yet ... To build, use one of the below (as appropriate):

% ./bjam cflags="-m64 -fPIC" cxxflags="-m64 -fPIC" linkflags="-m64 -fPIC" address-model=64 threading=multi architecture=x86 stage
or
% ./bjam cflags="-m32 -fPIC" cxxflags="-m32 -fPIC" linkflags="-m32 -fPIC" address-model=32 threading=multi architecture=x86 stage

I am sure you already see the problem  - on AMD processors, you may have a different "arhitecture" so we cannot give you the exact instruction to use here. Possible architectures are x86, x86_amd64, x86_ia64, amd64 or ia64.

When you are done wth compiling, execute nearly the same command but instead of "stage" use

% ... install --prefix=$XOPTSTAR

and you will be hopefully done.

unuran

This package follows a typical install i.e.

% ./configure --prefix=$XOPTSTAR CXXFLAGS="-m32 -fPIC" CFLAGS="-m32 -fPIC" LDFLAGS="-m32"
or
% ./configure --prefix=$XOPTSTAR CXXFLAGS="-m64 -fPIC" CFLAGS="-m64 -fPIC" LDFLAGS="-m64"

% make
% make install

This will allow ROOT to build the TUnuran classes.

LHAPDF-6.1.6

This package is not straight forward to install. Use the usual initial setup i.e.
 

./configure --prefix=$XOPTSTAR CXXFLAGS="-m32 -fPIC" CFLAGS="-m32 -fPIC" LDFLAGS="-m32"

or

./configure --prefix=$XOPTSTAR CXXFLAGS="-m64 -fPIC" CFLAGS="-m64 -fPIC" LDFLAGS="-m64"

then the usual

% make

However, before make install, modify  lhapdf-config and add -m32 -fPIC (-m64 -fPIC for 64 bits platform) to cflags and -m32 (-m64) to the ldflags i.e.
 

40c40
< test -n "$tmp" && OUT="$OUT -m32 -fPIC -I${prefix}/include "
---
> test -n "$tmp" && OUT="$OUT -I${prefix}/include "
46c46
< test -n "$tmp" && OUT="$OUT -m32 -L${exec_prefix}/lib -lLHAPDF"
---
> test -n "$tmp" && OUT="$OUT -L${exec_prefix}/lib -lLHAPDF"

as the configure will not do that and hence, not generate a config script suitable for a a mix 32/64 bits.

 
vim

Configure is standard, use a minimal option set and features=big as below

% ./configure --enable-pythoninterp=yes --enable-perlinterp=yes --enable-cscope --with-features=big \
--prefix=$XOPTSTAR CFLAGS="-m64 -fPIC" CXXFLAGS="-m64 -fPIC" LDFLAGS="-m64"

or

./configure --enable-pythoninterp=yes --enable-perlinterp=yes --enable-cscope --with-features=big \--prefix=$XOPTSTAR CFLAGS="-m32 -fPIC" CXXFLAGS="-m32 -fPIC" LDFLAGS="-m32"

then the usual make and make install.





 

No AFS user


If you do not have AFS available, the repository path is also available via http using the URL http://www.star.bnl.gov/common/ . A note however that the tree is NOT browsable so, you will need to get a listing of files separately (from the RCF for example) and grab the packages of interest as you need, referring them by name. All relative path are preserved.

One part is indexed and browsable and allows for quick-and-dirty updates. The area is common/STAR . Beware that we, at present, have no automated ways to update this area but packages would be available. This is provided and most convenient for users having already a STAR environment installed (espeically OPTSTAR populated) and wanting to update libraries to te most recent resvision (a few STAR release are available there; this is NOT meant to contain all packages).

ATTENTION:

  • It is essential that you do NOT refer to this area from any Web site as exposure of the packages may be seen as a redistribution mechanism for often unclear redistribution clause in the license agreement.
  • Referring this area, if detected, will lead to the suppression of this convenience as a whole or the setting of a rotating password protection (making it less convenient to download packages).

 

CERN libraries

The STAR simulation framework will require the CERN libraries to be installed. This will likely be the most problematic portion of the STAR software installation as there is little support for the CERNLib nowadays (so, you must rely on existing supported versions).

  • Information about the CERN libraries can be found here with the appropriate download area.
    If you do not find a version of the CERNlib for your platform, well ... you are out of luck.
    If you have the same OS as BNL and do not want to spend to much time, copy from BNL and move on.
     
  • If you experience problems with the 64 bit versions and this was not copied from BNL, consider one of the following:
    • 64bit CERN libs showed to work at first were taken from DESY colleagues from this link.
    • CERNLib are also available in rpm format for Fedora EL4/EL5 (SL4/SL5). See this link for more information.

Note however that (for example) a generic Linux distribution or an older Linux version based distribution may work for respectively a different flavor of Linux or a more recent of Linux.

 

Building ROOT in STAR

Building ROOT in STAR

How to build ROOT at BNL and other sites (support documentation)

Some of the above help is similar than what you will find in the PDSF page mentioned above.
Older help version could be found from the revision tab.

ATTENTION

  • BEWARE that the target for 64 bit machine is specific to your platform. We use linuxx8664gcc at BNL.
     
  • Several patches are available for this revision (see at the bottom as usual). All sources were repackages in Additional software components as root_v5.34.30_2-star.source.tar.gz  (this revision was last made on 2014/06/25). Taking the repacked tar ball is the preffered and recomended approach. Alternatively, you may download the root code from root.cern.ch (the source tar balls are available on their site) but extra steps/actions noted in orange below are needed.
     
  • This revision supports Qt4 in its latest incarnation i.e. qt-everywhere-opensource-src-4.8.7 (see Additional software components for more information) - qt 4.4 was left for use for SL5.3
     
  • We substantially dropped several OS support since the previous release and this still applies to 5.34.09 i.e.:
    • cygwin not tested at all
    • Mac OSX not tested
    • Solaris and True64 are no longer available to us as build platform and hence, we cannot conclude
    • Linux revisions prior to 5.3 (Boron) are no longer supported.
       
  • Only Pythia6 is supported in this version (and is default)

The build

The build is in several steps ; we will assume for this example that we are building root version 5.22.00 ; the % sign is used for the Unix prompt.

  1. Go to the $ROOT directory, create a directory named after the version i.e. 5.22.00,  and unpack the tar ball using

    % cd $ROOT
    % mkdir 5.34.30_2

    % cd 5.34.30_2

    % setenv ROOTB `pwd`

    One note : the ROOTB environment variable will be used throughout this help but is not used by the STAR environment. It only serves the purpose of finding back easily the base tree for the ROOT version you are trying to install.

    If you are a STAR user, you can also find a copy of the package in AFS at the location
    /afs/rhic/star/common/General/Sources/  (again, this is the reocmmended approach to avoid extra complications).
    % tar -xzf root_v5.34.30_2-star.source.tar.gz

    Remember to remove the archive tar.gz file after unpacking. It is not helpful to keep it around.
    ALL required files (including the STAR specific configurations) are part of our repackaged sources. Starting from ROOT 5.34, no other means are recommended (you may still download the original codes from the ROOT Web site but the STAR specific configs and patches will not be present).
     
  2. The structure is not ready ... yet ... You still have to create the OS dependent tree structure we use to allow concurrent platform support in addition of the two layers (with debugging, without debugging information). This is done in a one-does-it-all script.

    % $STAR/mgr/MakeRootDir.pl

    This command needs to be executed ONCE on EACH supported platform.
    IMPORTANT NOTE: $STAR environment variable needs to be defined here but it case it is not, the latest version available in /afs/rhic.bnl.gov/star/packages/dev/mgr is the safest to use (because it is the latest available). You need to access this script from the AFS path (or make a local copy) if you start building a site from scratch as well (ROOT needs to be installed before $STAR).
     
  3. The next step will give an example on how to build it under the linux platform. The above assumes that root is installed in AFS (to make it generic) and that .@sys allows to separate the OS/compiler flavors (which is the case, but not valid for off-site NFS resident tree structure where you will have to specifically use .$STAR_HOST_SYS instead of .@sys ).

    % cd $ROOTB/.$STAR_HOST_SYS/rootdeb
    % setenv ROOTSYS `pwd`
    % $STAR/mgr/fixrootmk
    -unlink
    % ./configure linux --build=debug ... remaining options ... +

    % cd $ROOTB/.$STAR_HOST_SYS/root
    % setenv ROOTSYS `pwd`
    %
    $STAR/mgr/fixrootmk -unlink
    % ./configure linux  ... remaining options ...
    +

    The above examples are both building the standard root distribution and option. This IS NOT what we are using in STAR and the above table shows the default options currently in use. You MUST of course add the --build=debug flag where appropriate as shown above while using that table (i.e. the main point of the two commands as shown is that the rootdeb tree is build with the configure command --build=debug while the optimized version is not).

    Notes:
    1. You MUST issue a similar command on ALL platform you are supporting before proceeding to the next step.
    2. You need to run ./configure ONCE only. In case of any further updates, you should NOT repeat this action.
     
  4. Execute the following script before compiling
    % $STAR/mgr/fixrootmk
    This will check the configuration file for special tags (in the old versions, it would check and fix CERNLIB for example).
    Note that the same command prior with -unlink moved the system.rootrc aside before the configure and make local copies of some of the files which should not be shared between multiple-platforms. At the end of compilation, we will ask you to execute the script again to re-install the system.rootrc and similar files (i.e. make use of the global one for all platforms).

  5. Now we are ready to compile. To do this, simply use the below sequence where you have configured the package.
    % gmake


    Notes:
    1. the gmake step may display messages like No such file or directory .
      Please, ignore since this is normal and will trigger the proper directory tree creation.
    2. All is compiled AND INSTALLED. There is NO NEED to use make install.
    3. For those compiling on AFS, if gmake (or make) keeps calling the reconfigure script over and over again, you have two solutions. Either 'sleep 1 && touch Makefile'  and type make again or use 'make --assume-new=Makefile'. The former will reset the Makefile timestamp to a second later of the last command executed and the later tell make to consider the timestamp of the Makefile file as if it has just been modified. You may also try a 'make clean' and try again.
     
  6. As noted in a previous step, at the end, finalize the build by doing
    % $STAR/mgr/fixrootmk -fix

    and at least once

    % cd $ROOT
    % test -e 5.34.30 && rm -f 5.34.09 && ln -s 5.34.09_1 ./5.34.09

    This last command link your current build tree to the official 5.34.09. If you had a previous patch level version, it will be replaced at that stage and the new build will become active. There should be NO need to rebuild root4star for incremental patch levels.
    You are now done.

 

Table of options

Platform/OS

32/64 bits

configure script options

Linux = linux 32 bits --enable-table --enable-qt --with-pythia6-libdir=$XOPTSTAR/lib --enable-roofit --enable-mathmore --with-mysql-libdir=/usr/lib/mysql --enable-unuran --enable-xrootd --with-thread-libdir=/lib --enable-vc --enable-cxx11
Linux = linuxx8664gcc 64 bits --enable-table --enable-qt --with-pythia6-libdir=$XOPTSTAR/lib --enable-roofit --enable-mathmore --with-mysql-libdir=/usr/lib64/mysql --enable-unuran --enable-xrootd --with-thread-libdir=/lib64 --enable-vc --enable-cxx11

 

Notes:

  1. You should use $OPTSTAR at remote sites having a local /opt/star and $XOPTSTAR if you provide support over AFS.
  2. If you build Python support, please refer to Quick PyROOT tests for quick tests to see if it works
  3. To enable the build with QT4, you MUST define QTROOT to point to the QT4 directory prior to executing the configure script (the build will oherwise be silent on missing it)
  4. For SL5.3, this version of ROOT was built with the additional option --enable-builtin-pcre
  5. If you intend to build ROOT with another compiler revision, ad the options --with-cc=`which gcc` --with-cxx=`which g++`.
  6. --disable-xrootd should be used wherever you do not have Xrootd support (or do not need it)
  7. If you change the Python verison on your system, note that the PyROOT binding will need to be remade. However, this will not be as easy as typing make as specific version-keyed includes will be needed. You may however try something like
    % modify python2.4 python2.7 ./config/Makefile.config
    in the $ROOTSYS directory and type make afterward (this will work). Ultimately, you may also re-build in your private area and
    % cp -fp lib/libPyROOT.so $ROOTSYS/lib
    % cp -fp lib/ROOT.py* $ROOTSYS/lib

    i.e. no need to compile in-place.

 

List of updated files 

The below list is provided for convenience but you should send a note if you note ANY differences from this list and what was packaged for use by remote sites. In the below,  A=added, P=patched, U=updated:

     
P    root/cint/cint/inc/G__ci.h
P    root/math/vc/Module.mk
P    root/bindings/pyroot/Module.mk

 

Typical patched codes The following codes are tweaked

 

cint/cint/inc/G__ci.h
#define G__LONGLINE
#define G__ONELINE
#define G__MAXNAME
#define G__ONELINEDICT
Check if appropriate (like at least 1024, 512, 256, 8) Alter behaviors of CINT but generally, G__LONGBUF setting is fine (usually forced).

 

STAR codes

Basic


The STAR code and libraries follows a structure and policy described in Library release structure and policy. Changes in each version is described in Library release history.

Installing the core STAR software is (should be) as simple as getting a full set of code for a given library, unpacking it into $STAR_PATH (default is $STAR_ROOT/packages as described in Setting up your computing environment) and issuing the following commands (in our example, we use STAR_LEVEL=SL09b with revision 1 from that library).

% cd $STAR_PATH
% mkdir SL11d && cd SL11d
% cvs co -rSL11d asps mgr QtRoot StarVMC StRoot kumacs pams StarDb StDb OnlTools
% starver SL11d 
% cd $STAR
% cons


And wait ... until all is done. This will actually build the non-optimized version of our libraries.

Modifiers

To build the optimized version, use

% setenv NODEBUG yes

before you execute the starver command. If you need both, you will hence have to build twice per libraries.

To build using alternate compilers, you will need to run the setup command before running cons. For example, for the icc compiler you will need an appropriate version of $OPTSTAR and

% setup icc

and for an alternate version of gcc (and pending the fact you have the specific version installed), you will need to use something similar to

% setup gcc 4.5.1

Note that those syntax assumes specific path for gcc (installed in either /opt/gcc/$version or $OPTSTAR/alt/) while icc is expected to have an setup program located in $GROUP_DIR (as intelcc.csh) defining the paths.

Finally, on kernel supporting it, you can also switch to an alternate bits environment like this

% setup 64bits

and get the compilation proceed with the 64 bits support.

Tips

Excluding problematic directories

Sometimes, our libraries get packed with the "Pool" (user space) libraries and their support may vary. To be on the safe side, exclude several of them from compilation by setting the environment variable SKIP_DIRS before executing cons.

% setenv SKIP_DIRS "StEbyePool StHighptPool StAngleCorrMaker StSpinMaker StEbyeScaTagsMaker 
StEbye2ptMaker StDaqClfMaker StFtpcV0Maker StStrangePool GeoTestMaker"

Special levels

The levels pro, new and dev are special levels as described in Library release structure and policy. pro is especially relevant as if no level is specified, the STAR login will revert to whatever pro is set to be. You may then do something like the below (again, our example assumes the default library is SL07b - please adjust accordingly).

% cd $STAR_PATH
% test -e pro && rm -f pro
% ln -s SL11d ./pro

Your default STAR library is then set for your site.

Post installation

Soft-links for compatibility

In /opt/star (or equivalent), $STAR and $ROOT/$ROOT_LEVEL, run the script $STAR/mgr/CreateLinks. This will create a few compatibility links to support additional (tested and proven to work) OS / sysname version.


Other

The list below is not exhaustive. Note that most does not need to be done and we separate the action items in two categories

To be checked or modify

  • Consult the script site_pre_setup.csh and site_post_setup.csh . ANY site specific envrionment variables (such as STAR_ROOT, OPTSTAR and so on) can be set there. See the Setting up your computing environment used in STAR for more information on defaults (you should have read it before reaching this section though ;-)  ).
     
  • Please, modify your local template login files ${GROUP_DIR}/templates/ files and make sure the GROUP_DIR directory is properly reflected. Note that ANY other variables set for your site should be set in site_pre_setup.csh
     

Optional - for large site and Tier centers wanting network independence.

  • Database connection setup - a local database setup could be used and is expected to be located in ${STAR_PATH}/conf/dbLoadBalancerLocalConfig_${SITE}.xml . Modify this file (based on existing template) to point to the local database service.
    To setup a local database service, consult Set up replication slave for Offline and FC databases for getting a local database service going.
    Note: Attention, if you run on the Cloud, those instructions may be useful (a service can be created per VM)
     
  • Setting up SUMS - only for site intending to submit jobs locally
    SUMS will need a local batch system and a local config (ask the scheduler Hypernews).
    We support LSF, PBS, Condor and SGE.
      
  • FileCatalog - not needed unless you are a Tier1 center, holder of files in our distributed data management system.
    A copy of the FileCatalog can be set to the local site and/or a multi-master setup can be done.
    To first order, your local FileCatalog replica would minimally allow querrying the FC without the load we see at BNL (but is not that useful). A multi-master would allow full implementation of our data-management system and scheme including file transfer between sites and multi-site transfers.
    But for the API to know about the local setup, you need to modify the unique local configuration file $STAR_PATH/conf/Catalog.xml and add your configuration in. The key is SITE name="XXXX" (add a new block for your site and please, pass on the information back to the Catalog maintainer). Make sute this value matches the value of $SITE set in group_env.csh .
     
  • [*** data management / multi-site transfer instructions needed ***]

Provision CVMFS and mount BNL/STAR repo

These instructions are provided for you to install the CVMFS client and mount the STAR CVMFS repo – star.sdcc.bnl.gov The STAR software has been installed here and may be used as a replacement for AFS.

 

1. Get CVMFS yum repository    

# wget -O /etc/yum.repos.d/cernvm.repo http://cvmrepo.web.cern.ch/cvmrepo/yum/cernvm.repo

1. Get CVMFS RPM GPG KEY

# wget -O /etc/pki/rpm-gpg/RPM-GPG-KEY-CernVM http://cvmrepo.web.cern.ch/cvmrepo/yum/RPM-GPG-KEY-CernVM

3. Install cvmfs (At this time the version we are installing is cvmfs-2.5.1-1.el7.x86_64)

# yum install cvmfs cvmfs-config-default

4. Run the command below to automate the creation of the cvmfs user and create an entry in autofs

# cvmfs_config setup

Note: At this time with these instructions (config files provided) and/or other undetermined factors, CVMFS pointing to the star.sdcc.bnl.gov repo via autofs is not stable. Further instructions have been provided (i.e hard mount)

5. Add the file /etc/cvmfs/default.local

# wget -O /etc/cvmfs/default.local http://www.star.bnl.gov/~mpoat/cvmfs/default.local