Online Linux pool

Under:

March 15, 2012:

THIS PAGE IS OBSOLETE!  It was written as a guide in 2008 for documenting improvements in the online Linux pool, but has not been updated to reflect additional changes to the state of the pool, so not all details are up to date. 

One particular detail to be aware of:  the name of the pool nodes is now onlNN.starp.bnl.gov, where 01<=NN<=14.  The "onllinuxN" names were retired several years ago.

 

Historical page (circa 2008/9):

Online Linux pool for general experiment support needs

 

GOAL: 

Provide a Linux environment for general computing needs in support of the experiemental operations.

HISTORY (as of approximately June 2008):

A pool of 14 nodes, consisting of four different hardware classes (all circa 2001) has been in existence for several years.  For the last three (or more?) years, they have had Scientific Linux 3.x with support for the STAR software environment, along with access to various DAQ and Trigger data sources.  The number of significant users has probably been less than 20, with the heaviest usage related to L2.  User authentication was originally based on an antique NIS server, to which we had imported the RCF accounts and passwords.  Though still alive, we have not kept this NIS information maintained over time.  Over time, local accounts on each node became the norm, though of course this is rather tedious.  Home directories come in three categories:  AFS, NFS on onllinux5, and local home directories on individual nodes.  Again, this gets rather tedious to maintain over time.

There are several "special" nodes to be aware of:

  1. Three of the nodes (onllinux1, 2 and 3) are in the Control Room for direct console login as needed.  (The rest are in the DAQ room.)
  2. onllinux5 has the NFS shared home directories (in /online/users).  (NB.  /online/users is being backed up by the ITD Networker backup system.)
  3. onllinux6 is (was?) used for many online database maintenance scripts (check with Mike DePhillps about this -- we had planned to move these scripts to onldb).
  4. onllinux1 was configured as an NIS slave server, in case the NIS master (starnis01) fails.

 

PLAN:

For the run starting in 2008 (2009?), we are replacing all of these nodes with newer hardware.

The basic hardware specs for the replacement nodes are:

Dual 2.4 GHZ Intel Xeon processors

1GB RAM

2 x 120 GB IDE disks

 

These nodes should be configured with Scientific Linux 4.5 (or 4.6 if we can ensure compatibility with STAR software) and support the STAR software environment.

They should have access to various DAQ and Trigger NFS shares.  Here is a starter list of mounts:

 

Shared DAQ and Trigger resources

SERVER DIRECTORY on SERVER LOCAL MOUNT PONT MOUNT OPTIONS
 evp.starp  /a  /evp/a  ro
 evb01.starp  /a  /evb01/a  ro
 evb01  /b  /evb01/b  ro
 evb01  /c  /evb01/c  ro
 evb01  /d  /evb01/d  ro
 evb02.starp  /a  /evb02/a  ro
 evb02  /b  /evb02/b  ro
 evb02  /c  /evb02/c  ro
 evb02  /d  /evb02/d  ro
 daqman.starp  /RTS  /daq/RTS  ro
 daqman  /data  /daq/data  rw
 daqman  /log  /daq/log  ro
 trgscratch.starp  /data/trgdata  /trg/trgdata  ro
 trgscratch.starp  /data/scalerdata  /trg/scalerdata  ro
 startrg2.starp  /home/startrg/trg/monitor/run9/scalers  /trg/scalermonitor  ro
 online.star  /export  /onlineweb/www  rw

 

 

WISHLIST Items with good progress:

  • <Uniform and easy to maintain user authentication system to replace the current NIS and local account mess.  Either a local LDAP, or a glom onto RCF LDAP seems most feasible> -- An ldap server (onlldap.starp.bnl.gov) has been set-up and the 15 onllinux nodes are authenticating to it *BUT* it is using NIS!
  • <Shared home directories across the nodes with backups> -- onlldap is also hosting the home directories and sharing them via NFS.  EMC Networker is backing up the home directories and Matt A. is recieving the email notifications.
  • <Integration into SSH key management system (mechanism depends upon user authentication method(s) selected).> --  The ldap server has been added to the STAR SSH key management system, and users are able to login to the new onlXX nodes with keys now.
  • <Common configuration management system> -- Webmin is in use.
  • <Ganglia monitoring of the nodes> -- I think this is done...
  • <Osiris monitoring of the nodes> -- I think this is done - Matt A. and Wayne B. are receiveing the notices...

WISHLIST Items still needing significant work:

  • None?