Online Linux Pool

This page provides an overview of the Online Linux Pool (OLP).  The OLP is a cluster of computers made available to STAR collaborators with the primary intent of allowing real-time and near real-time run support activities, but with general usage and various computing development and testing projects envisioned as resources permit.

The OLP currently consists of 60 Penguin Altus 1300 rack-mount computers physically located in the DAQ Room, plus two servers that provide home directories (over NFS), user authentication (NIS), and Condor pool management.  The "worker" nodes are named onl01, onl02, ..., onl60.starp.bnl.gov.  These 60 pool nodes have 64-bit Scientific Linux 5.8 (with 32-bit libraries).  Any user with access to the stargw.starp.bnl.gov SSH gateways has access to these 60 nodes.  Users of the RACF will recognise the "rterm" command, which if executed on a stargw host will attempt to connect to one of the nodes with relatively low load. 


Remote filesystems:

All nodes have access to several remote filesystems that may be useful to online computing:

  • /evp/a (read-only access to the DAQ Event Pool)
  • /daq/RTS (read-only access to daqman's /RTS export)
  • /daq/data (read-write(!) access to daqman's /data export)
  • /daq/log (read-only access to daqman's /log export)
  • /onlineweb/www (read-write access to the online web server's space for content to be shared over the web)
  • /afs the standard AFS tree

Additionally, onl01-onl06 are configured to access trigger data at:

  • /trg/trgdata (trgscratch's trgdata export)
  • /trg/scalerdata (startrg2's scalerdata export). 


Condor

A Condor pool is set up on these nodes.  Currently onl01-30 are in the pool (moduo a few specialized nodes not accepting jobs), serving as execute hosts.

rterm is available on the Accessing The STAR Protected Network hosts to select the least-loaded system for login.  Only a subset of nodes are tagged as interactive for rterm.  That list is currently onl01-10 .

Cron

conjobs are accepted and can run only on onl11,12, and 13. To access the exported Web directories in write mode, you need to be part of the onlweb group. Every year before the run, a list of point of contact is compiled and used to determine who should be granted access (this is not given by default).


General system details (hardware, OS, etc):

The Penguin nodes have 64-bit Scientific Linux 5.8 installations (with 32-bit libraries), with these basic hardware specs:

2 x Dual Core AMD Opteron Processor 265, 1800MHz (4 cores per system, no HT)

8GB RAM (PC3200 DDR 400MHz ECC)

4 SATA disk bays

  • onl01-onl30: 4 x 500GB disks (7200RPM) in a RAID configuration providing a 1.3 TB scratch space (mounted at /scratch)
  • onl31-onl60: 4 x 1TB disks (7200RPM) in a RAID configuration providing a 2.6 TB scratch space

Usage suggestions and miscellaneous note for users:

To reduce the burden on the network and the home directory NFS file server, it is advisable for heavy users of distributed jobs (ie. Condor jobs) to avoid unnecessary access to their individual home directories.  As much as possible, please consolidate access to your home directories, and use the local disks as needed for storage.  Small, short-term needs (up to the order of 100MB or so) can use subdirectories under /tmp, while larger demands should use directories under /scratch on each individual node.  We expect at some point in the future to provide a shared file system (other than the home directories) of some significant size, but are not there yet.

The OLP nodes only allow access based on SSH keys.  If you have access to the stargw SSH gateways, you will also automatically have access to the OLP.  To make it most convenient, it is suggested that you familiarize yourself with SSH key agents and SSH key forwarding, which can (nearly) eliminate all need for typing passwords/passphrases.