Online Computing

General

The online Web server front page is available here. This Drupal section will hold complementary informations.
A list of all operation manuals (beyond detector sub-systems) is available at You do not have access to view this node.
Please use it a startup page.

Detector sub-systems operation procedures - Updated 2008, requested confirmation for 2009

 

Online computing run preparation plans

This page will list by year action items, run plans and opened questions. It will server as a repository for documents serving as basis for drawing the requirements. To see documents in this tree, you must belong to the Software and Computing OG (the pages are not public).

Run 19

Feedback from software coordinators

Active feedback

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC

Raghav Kunnawalkam Elayavalli

Nick Lukow

- same -

Note: L2algo, bemc and  bsmdstatus

EPD Prashant Shanmuganathan N/A - same -
BTOF Frank Geurts - same - Frank Geurts
Zaochen Ye
ETOF Florian Seck - same - Florian Seck
Philipp Weidenkaff
HLT Hongwei Ke - same - - same -

Other software coordinators

sub-system Coordinator
iTPC (TPC?) Irakli Chakaberia
Trigger Akio Ogawa
DAQ Jeff Landgraf
...  

Run 20

Status of calibration timeline initialization

In RUN: EEMC, EMC, EPD, ETOF, GMT, TPC, MTD, TOF
Test: FST, FCS, STGC (no tables)
Desired init dates where announced to all software coordinators:

- Geometry tag has a timestamp of 20191120
- Simulation timeline [20191115,20191120[
- DB initialization for real data [20191125,...]

     Please initialize your table content appropriate yi.e.
sim flavor initial values are entered at 20191115 up to 20191119
(please exclude the edge),  ofl initial values at 20191125
(run starting on the 1st of December, even tomorrow's cosmic
and commissioning would pick the proper values).

 

 

Status - 2019/12/10

EMC  = ready
ETOF = ready - initialized at 2019-11-25, no sim (confirming)
TPC  = NOT ready [look at year 19 for comparison]
MTD  = ready
TOF  = Partially ready? INL correction, T0, TDC, status and alignement tables initialized
EPD  = gain initialized at 2019-12-15 (!?), status not initialized, no sim

EEMC = ready? (*last init at 2017-12-20)
GMT  = ready (*no db tables)



Status - 2019/12/09

EMC  = ready
ETOF = ready? initialized at 2019-11-25, no sim
TPC  = NOT ready
MTD  = ready
TOF  = NOT ready
EPD  = gain initialized at 2019-12-15 (!?), status not initialized, no sim

EEMC = ready? (*last init at 2017-12-20)
GMT  = ready (*no db tables)

 

 

Software coordinator feedback for Run 20 - Point of Contacts

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC
EEMC

Raghav Kunnawalkam Elayavalli

Nick Lukow

- same -

Note: L2algo, bemc and  bsmdstatus

EPD [ TBC] - same - - same -
BTOF Frank Geurts - same - Frank Geurts
Zaochen Ye
ETOF Florian Seck - same - Florian Seck
Philipp Weidenkaff
HLT Hongwei Ke - same - - same -
TPC Irakli Chakaberia - same -
Flemming Videbaek
Trigger detectors Akio Ogawa - same - - same -
DAQ Jeff Landgraf N/A  


---




Run 21

Status of calibration timeline initialization

- Geometry tag has a timestamp of 20201215
- Simulation timeline [20201210, 20201215]
- DB initialization for real data [20201220,...]

Status - 2020/12/10

 

Software coordinator feedback for Run 21 - Point of Contacts

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC
EEMC

Raghav Kunnawalkam Elayavalli

Nick Lukow

- same -

Note: L2algo, bemc and  bsmdstatus

EPD Prashanth Shanmuganathan (TBC) Skipper Kagamaster - same -
BTOF Zaochen - same - Frank Geurts
Zaochen Ye
ETOF Philipp Weidenkaff - same - Philipp Weidenkaff
HLT Hongwei Ke - same - - same -
TPC Yuri Fisyak - same - Flemming Videbaek
Trigger detectors Akio Ogawa - same - - same -
DAQ Jeff Landgraf N/A  
Forward Upgrade Daniel Brandenburg - same - FCS - Akio Ogawa
sTGC - Daniel Brandenburg
FST - Shenghui Zhang/Zhenyu Ye
       

---

Run 22

 

Status of calibration timeline initialization

- Geometry tag has a timestamp of 20211015
- Simulation timeline [20211015, 20211020[
- DB initialization for real data [20211025,...]

Status - 2021/10/13

 

Software coordinator feedback for Run 22 - Point of Contacts (TBC)

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC
EEMC

Raghav Kunnawalkam Elayavalli
Navagyan Ghimire

- same -

Note: L2algo, bemc and  bsmdstatus

EPD Prashanth Shanmuganathan (TBC) Skipper Kagamaster - same -
BTOF Zaochen - same - Frank Geurts
Zaochen Ye
ETOF Philipp Weidenkaff - same - Philipp Weidenkaff
HLT Hongwei Ke - same - - same -
TPC Yuri Fisyak - same - Flemming Videbaek
Trigger detectors Akio Ogawa - same - - same -
DAQ Jeff Landgraf N/A  
Forward Upgrade Daniel Brandenburg - same - FCS - Akio Ogawa
sTGC - Daniel Brandenburg
FST - Shenghui Zhang/Zhenyu Ye
       

---

Run XIII

Preparation meeting minutes

Database initialization check list

TPC Software  – Richard Witt          NO
GMT Software  – Richard Witt          NO
EMC2 Software - Alice Ohlson          Yes
FGT Software  - Anselm Vossen         Yes
FMS Software  - Thomas Burton         Yes
TOF Software  - Frank Geurts          Yes
Trigger Detectors  - Akio Ogawa       ??
HFT Software  - Spyridon Margetis     NO (no DB interface, hard-coded values in preview codes)

 

Calibration Point of Contacts per sub-system

If a name is missing, the POC role falls onto the coordinator.
                Coordinator           Possible POC
                ------------          ---------------
TPC Software  – Richard Witt          
GMT Software  – Richard Witt          
EMC2 Software - Alice Ohlson          Alice Ohlson  
FGT Software  - Anselm Vossen         
FMS Software  - Thomas Burton         Thomas Burton    
TOF Software  - Frank Geurts          
Trigger Detectors  - Akio Ogawa       
HFT Software  - Spyridon Margetis     Hao Qiu

Online Monitoring POC

The final list from the SPin PWGC can be found at You do not have access to view this node . The table below includes the Spin PWGC feedback and other feedbacks merged.

  Directories we inferred are being used (as reported in the RTS Hypernews)
  scaler Len Eun and Ernst Sichtermann (LBL) This directory usage was indirectly reported
  SlowControl James F Ross (Creighton)  
  HLT Qi-Ye Shou The 2012 directory had a recent timestamp but owned by mnaglis. Aihong Tang contacted 2013/02/12
Answer from  Qi-Ye Shou 2013/02/12 - will be POC.

  fmsStatus Yuxi Pan (UCLA) This was not requested but the 2011 directory is being overwritten by user=yuxip
FMS software coordinator contacted for confirmation 2013/02/12
Yuxi Pan confirmed 2013/02/13 as POC for this directory

     
Spin PWG monitoring related directories follows
  L0trg Pibero Djawotho (TAMU)  
  L2algo Maxence Vandenbroucke (Temple)  
  cdev Kevin Adkins (UKY)  
  zdc Len Eun and Ernst Sichtermann (LBL)  
  bsmdStatus Keith Landry (UCLA)  
  emcStatus Keith Landry (UCLA)  
  fgtStatus Xuan Li (Temple) This directory is also being written by user=akio causing protection access and possible clash problems.
POC contacted on 2013/02/08, both Akio and POC contacted again 2013/02/12 -> confirmed as OK.

  bbc Prashanth (KSU)  



Run XIV


Preparation meeting meetings, links

  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node

Notes

  • 2013/11/15
    • Info gathering begins (directories/areas and Point of Contacts)
      Status:
      2013/11/22, directory structure, 2 people provided feedback, Renee coordinated the rest
      2013/11/25, calibration POC, 3 coordinators provided feedback - Closed 2013/12/04
      2013/12/04, geometry for Run 14,
       
    • Basic check: CERT for online is old if coming from the Wireless
      Status: fixed at ITD level, 2013/11/18 - the reverse proxy did not have the proper CERT
  • 2013/1125

Database initialization check list

This actions suggested by this section has not started yet.

Sub-system Coordinator Check done
DAQ
Jeff Landgraf  
TPC Richard Witt  
GMT Richard Witt  
EMC2 Mike Skoby
Kevin Adkins
 
FMS Thomas Burton  
TOF Daniel Brandenburg  
MTD Rongrong Ma  
HFT Spiros Margetis (not known)
Trigger Akio Ogawa  
FGT Xuan Li  


Calibration Point of Contacts per sub-system

"-" indicates no feedback was provided. But if a name is missing, the POC role falls onto the coordinator.

Sub-system Coordinator Calibration POC
DAQ Jeff Landgraf -
TPC Richard Witt -
GMT Richard Witt -
EMC2 Mike Skoby
Kevn Adkins
-
FMS Thomas Burton -
TOF Daniel Brandenburg -
MTD Rongrong Ma Bingchu Huan
HFT Spiros Margetis Jonathan Bouchet
Trigger Akio Ogawa -
FGT Xuan Li N/A


Online Monitoring POC


scaler   Not needed 2013/11/25
SlowControl Chanaka DeSilva OKed on second Run preparation meeting
HLT Zhengquia Zhang  Learn incidently on 2014/01/28
HFT Shusu Shi Learn about it on 2014/02/26
fmsStatus   Not needed 2013/11/25
L0trg Zilong Chang
Mike Skoby
 
Informed 2013/11/10 and created 2013/11/15
L2algo  Nihar Sahoo Informed 2013/11/25
cdev   Not needed 2013/11/25
zdc   may not be used (TBC)
bsmdStatus  Janusz Oleniacz Info will be passed from Keith Landry 2014/01/20
Possible backup, Leszek Kosarzewski 2014/03/26
emcStatus  Janusz Oleniacz Info will be passed from Keith Landry 2014/01/20
Possible backup, Leszek Kosarzewski 2014/03/26
fgtStatus   Not needed 2013/11/25
bbc
 Akio Ogawa Informed 2013/11/15, created same day


Run XV

Run 15 was preapred essentiallydiscussing with indviduals and a comprehensive page not maintained.

Run XVI


This page will contain feedback related to the preparation of the online setup.

 

Notes



 

Online Monitoring POC

scaler    
SlowControl    
HLT Zhengqiao Feedback 2015/11/24
HFT Guannan Xie Spiros: Feedback 2015/11/24
fmsStatus   Akio: Possibly not needed (TBC). 2016/01/13 noted this was not used in Run 15 and wil probably never be used again.
fmsTrg   Confirmed neded 2016/01/13
fps   Akio: Not neded in Run 16? Perhaps later.
L0trg Zilong Chang Zilong: Feedback 2015/11/24
L2algo Kolja Kauder Kolja: will be POC - 2015/11/24
cdev Chanaka DeSilva  
zdc    
bsmdStatus Kolja Kauder Kolja: will be POC - 2015/11/24
bemcTrgDb Kolja Kauder Kolja: will be POC - 2015/11/24
emcStatus Kolja Kauder Kolja: will be POC - 2015/11/24
fgtStatus   Not needed since Run 14 ... May drop from the list
bbc
Akio Ogawa Feedback 2015/11/24, needed
rp    

 

Calibration Point of Contacts per sub-system

Sub-system Coordinator Calibration POC
DAQ Jeff Landgraf -
TPC Richard Witt
Yuri Fisyak
-
GMT Richard Witt -
EMC2 Kolja Kauder
Ting Lin
-
FMS Oleg Eysser -
TOF Daniel Brandenburg -
MTD Rongrong Ma (same confirmed 2015/11/24)
HFT Spiros Margetis Xin Dong
HLT Hongwei Ke (same confirmed 2015/11/24)
Trigger Akio Ogawa -
RP Kin Yip -

 

Database initialization check list



 

Shift Accounting

This page will now hold the shift accounting pages. They complement the Shift Sign-up process by documenting it.

Run 18 shift dues


Run 18 Shift Dues & Notes


Period coordinators

As usual, period coordinators are pre-assigned, as arranged by the Spokespersons.

Special arrangements and requests

  1. Under the family-related policy, the following 6 weeks of offline QA shifts were pre-assigned:
    MAR 27 Kevin Adkins (Kentucky)
    APR 03 Kevin Adkins
    APR 10 Sevil Salur (Rutgers)
    APR 17 Richard Witt (USNA/Yale)
    MAY 22 Juan Romero (UC Davis)
    JUN 12 Terry Tarnowsky (Michigan State)
     
  2. Lanny Ray (UT Austin), as QA coordinator, always is pre-assigned the first QA week.
     
  3. FIAS remains in “catch-up mode” and is taking extra shifts above their dues. Pre-assigned shifts can be requested in this scenario. FIAS has been pre-assigned 4 Detector Op shifts.
     
  4. Bob Tribble (TAMU) requests the evening Shift leader slot during Apr 10-17.

Run 19 special requests

The following pre-assigned slot requests were made.
    9 WEEKS PRE-ASSIGNED QA AS FOLLOWS
    ==================================
    Lanny Ray (UT Austin) QA Mar 5
    Richard Witt (USNA/Yale) QA Mar 19
    Sevil Salur (Rutgers) QA Apr 16
    Wei Li (Rice) QA Apr 23
    Kevin Adkins (Kentucky) QA May 14
    Juan Romero (UC Davis) QA May 21
    Jana Bielcikova (NPI, Czech Acad of Sci) QA May 28  
    Yanfang Liu (TAMU) QA June 25 
    Yanfang Liu (TAMU) QA July 02
    
    8 WEEKS PRE-ASSIGNED REGULAR SHIFTS AS FOLLOWS
    ==================================
    Bob Tribble (BNL) Feb 05 SL evening 
    Daniel Kincses (Eotvos) Mar 12  DO Trainee Day
    Daniel Kincses (Eotvos) Mar 19  DO Day
    Mate Csanad (Eotvos) Mar 12 SC Day
    Ronald Pinter (Eotvos) Mar 19 SC Day
    Carl Gagliardi (TAMU)  May 14  SL day
    Carl Gagliardi (TAMU)  May 21 SL day 
    Grazyna Odyniec (LBNL) July 02 SL evening
    
    

Shift Dues and Special Requests Run 20

For the calculation of shift dues, there are two considerations.
1) The length of time of the various shift configurations (2 person, 4 person no trainees, 4 person with trainees, plus period coordinators/QA shifts)
2) The percent occupancy of the training shifts

For many years, 2) has hovered about 45%, which is what we used to calculate the dues.  Since STAR gives credit for training shifts (as we should) this needs to be factored in or we would not have enough shifts.

The sum total of shifts needed are then divided by the total number of authors minus authors from Russian institutions who can not come to BNL.

date                  weeks           crew           training           PC           OFFLINE          
11/26-12/10    2                  2                      0                  0           0           
12/10-12/24    2                  4                      2                 1            0   
12/24-6/30      27                4                      2                 1            1   
7/02-7/16        2                  4                      0                 1            1   

Adding these together (3x a shift for crew, 3x45% for training, plus pc plus offline) gives a total of 522 shifts.
The total number of shifters is 303 - 30 Russian collaborators = 273 people
Giving a total due of 1.9 per author.

For a given institution, their load is calculated as # of authors - # of expert credits x due -> Set to an integer value as cutting collaborators into pieces is non-collegial behavior.

However, this year, this should have been:
date                  weeks           crew           training           PC           OFFLINE          
11/26-12/10    2                  2                      0                  0           0           
12/10-12/24    2                  4                      2                 1            0   
12/24-6/02      23                4                      2                 1            1   
6/02-6/16        2                  4                      0                 1            1   

Adding these together (3x a shift for crew, 3x45% for training, plus pc plus offline) gives a total of 456 shifts for a total due of 1.7 per author.

We allowed some people to pre-sign up, due to a couple different reasons.

Family reasons so offline QA:
James Kevin Adkins
Jana BielĨíková
Sevil Selur
Md. Nasim
Yanfang Liu

Additionally, Lanny Ray is given the first QA shift of the year as our experience QA shifter.

This year, to add an incentive to train for shift leader, we allowed people who were doing shift leader training to sign up for both their training shift and their "real" shift early:
Justin Ewigleben
Hanna Zbroszczyk
Jan Vanek
Maria Zurek
Mathew Kelsey
Kun Jiang
Yue-Hang Leung

Both Bob Tribble and Grazyna Odyniec sign up early for a shift leader position in recognition of their schedules and contributions

This year because of the date of Quark Matter and the STAR pre-QM meeting, several people were traveling on Tuesday during the sign up.  These people I signed up early as I did not want to punish some of our most active colleagues for the QM timing:
James Daniel  Brandenburg
Sooraj Radhakrishnan

3 other cases that were allowed to pre-sign up:
Panjab University had a single person who had the visa to enter the US, and had to take all of their shifts prior to the end of their contract in March.  So that the shifter could have some spaces in his shifts for sanity, I signed up:
Jagbir Singh
Eotvos Lorand University stated that travel is complicated for their group, and so it would be good if they could insure that they were all on shift at the same time.  Given that they are coming from Europe I signed up:
Mate Csanad
Daniel Kincses
Roland Pinter
Srikanta Tripathy
Frankfurt Institute for Advanced Studies (FIAS) wanted to be able to bring Masters students to do shift, but given the training requirements and timing with school and travel for Europe, this leaves little availability for shift.  So I signed up:
Iouri Vassiliev
Artemiy Belousov
Grigory Kozlov

Tools

This is to serve as a repository of information about various STAR tools used in experimental operations.

Implementing SSL (https) in Tomcat using CA generated certificates

The reason for using a certificate from a CA as opposed to a self-signed  certificate is that the browser gives a warning screen and asks you to except the certificate in the case of a self-signed  certificate. As there already exists a given list of trusted CAs in the browser this step is not needed.
 
The following list of certificates and a key are needed:

/etc/pki/tls/certs/wildcard.star.bnl.gov.Nov.2012.cert – host cert.
/etc/pki/tls/private/wildcard.star.bnl.gov.Nov.2012.key – host key (don’t give this one out)
/etc/pki/tls/certs/GlobalSignIntermediate.crt – intermediate cert.
/etc/pki/tls/certs/GlobalSignRootCA_ExtendedSSL.crt –root cert.
/etc/pki/tls/certs/ca-bundle.crt – a big list of many cert.

Concatenate the following certs into one file in this example I call it: Global_plus_Intermediate.crt
cat /etc/pki/tls/certs/GlobalSignIntermediate.crt > Global_plus_Intermediate.crt
cat /etc/pki/tls/certs/GlobalSignRootCA_ExtendedSSL.crt >> Global_plus_Intermediate.crt
cat /etc/pki/tls/certs/ca-bundle.crt >> Global_plus_Intermediate.crt

Run this command. Note that -name tomcat” and -caname root should not be changed to any other value. The command will still work but will fail under tomcat. If it works you will be asked for a password, that password should be set to "changeit".

 openssl pkcs12 -export -in wildcard.star.bnl.gov.Nov.2012.cert -inkey wildcard.star.bnl.gov.Nov.2012.key -out mycert.p12 -name tomcat -CAfile Global_plus_Intermediate.crt -caname root -chain

Test the new p12 output file with this command:

keytool -list -v -storetype pkcs12 -keystore mycert.p12

Note it should say: "Certificate chain length: 3"


In tomcat’s the server.xml file add a connector that looks like this:
 

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="150" scheme="https" secure="true"
           keystoreFile="/home/lbhajdu/certs/mycert.p12" keystorePass="changeit"
           keystoreType="PKCS12" clientAuth="false" sslProtocol="TLS"/>


Note the path should be set to the correct path of the certificate.  And the p12 file should only be readable by the Tomcat account because it holds the host key. 

Online Linux pool

March 15, 2012:

THIS PAGE IS OBSOLETE!  It was written as a guide in 2008 for documenting improvements in the online Linux pool, but has not been updated to reflect additional changes to the state of the pool, so not all details are up to date. 

One particular detail to be aware of:  the name of the pool nodes is now onlNN.starp.bnl.gov, where 01<=NN<=14.  The "onllinuxN" names were retired several years ago.

 

Historical page (circa 2008/9):

Online Linux pool for general experiment support needs

 

GOAL: 

Provide a Linux environment for general computing needs in support of the experiemental operations.

HISTORY (as of approximately June 2008):

A pool of 14 nodes, consisting of four different hardware classes (all circa 2001) has been in existence for several years.  For the last three (or more?) years, they have had Scientific Linux 3.x with support for the STAR software environment, along with access to various DAQ and Trigger data sources.  The number of significant users has probably been less than 20, with the heaviest usage related to L2.  User authentication was originally based on an antique NIS server, to which we had imported the RCF accounts and passwords.  Though still alive, we have not kept this NIS information maintained over time.  Over time, local accounts on each node became the norm, though of course this is rather tedious.  Home directories come in three categories:  AFS, NFS on onllinux5, and local home directories on individual nodes.  Again, this gets rather tedious to maintain over time.

There are several "special" nodes to be aware of:

  1. Three of the nodes (onllinux1, 2 and 3) are in the Control Room for direct console login as needed.  (The rest are in the DAQ room.)
  2. onllinux5 has the NFS shared home directories (in /online/users).  (NB.  /online/users is being backed up by the ITD Networker backup system.)
  3. onllinux6 is (was?) used for many online database maintenance scripts (check with Mike DePhillps about this -- we had planned to move these scripts to onldb).
  4. onllinux1 was configured as an NIS slave server, in case the NIS master (starnis01) fails.

 

PLAN:

For the run starting in 2008 (2009?), we are replacing all of these nodes with newer hardware.

The basic hardware specs for the replacement nodes are:

Dual 2.4 GHZ Intel Xeon processors

1GB RAM

2 x 120 GB IDE disks

 

These nodes should be configured with Scientific Linux 4.5 (or 4.6 if we can ensure compatibility with STAR software) and support the STAR software environment.

They should have access to various DAQ and Trigger NFS shares.  Here is a starter list of mounts:

 

Shared DAQ and Trigger resources

SERVER DIRECTORY on SERVER LOCAL MOUNT PONT MOUNT OPTIONS
 evp.starp  /a  /evp/a  ro
 evb01.starp  /a  /evb01/a  ro
 evb01  /b  /evb01/b  ro
 evb01  /c  /evb01/c  ro
 evb01  /d  /evb01/d  ro
 evb02.starp  /a  /evb02/a  ro
 evb02  /b  /evb02/b  ro
 evb02  /c  /evb02/c  ro
 evb02  /d  /evb02/d  ro
 daqman.starp  /RTS  /daq/RTS  ro
 daqman  /data  /daq/data  rw
 daqman  /log  /daq/log  ro
 trgscratch.starp  /data/trgdata  /trg/trgdata  ro
 trgscratch.starp  /data/scalerdata  /trg/scalerdata  ro
 startrg2.starp  /home/startrg/trg/monitor/run9/scalers  /trg/scalermonitor  ro
 online.star  /export  /onlineweb/www  rw

 

 

WISHLIST Items with good progress:

  • <Uniform and easy to maintain user authentication system to replace the current NIS and local account mess.  Either a local LDAP, or a glom onto RCF LDAP seems most feasible> -- An ldap server (onlldap.starp.bnl.gov) has been set-up and the 15 onllinux nodes are authenticating to it *BUT* it is using NIS!
  • <Shared home directories across the nodes with backups> -- onlldap is also hosting the home directories and sharing them via NFS.  EMC Networker is backing up the home directories and Matt A. is recieving the email notifications.
  • <Integration into SSH key management system (mechanism depends upon user authentication method(s) selected).> --  The ldap server has been added to the STAR SSH key management system, and users are able to login to the new onlXX nodes with keys now.
  • <Common configuration management system> -- Webmin is in use.
  • <Ganglia monitoring of the nodes> -- I think this is done...
  • <Osiris monitoring of the nodes> -- I think this is done - Matt A. and Wayne B. are receiveing the notices...

WISHLIST Items still needing significant work:

  • None?

 

SSH Key Management

Overview 

An SSH public key management system has been developed for STAR (see D. Arkhipkin et al 2008 J. Phys.: Conf. Ser. 119 072005), with two primary goals stemming from the heightened cyber-security scrutiny at BNL:

  • Use of two-factor authentication for remote logins
  • Identification and management of remote users accessing our nodes (in particular, the users of "group" accounts which are not tied to one individual) and achieve accountability

A benefit for users also can be seen in the reduction in the number of passwords to remember and type.

 

In purpose, this system is similar to the RCF's key management system, but is somewhat more powerful because of its flexibility in the association of hosts (client systems), user accounts on those clients, and self-service key installation requests.

Here is a typical scenario of the system usage: 

  1. A sysadmin of a machine named FOO creates a user account named "JDOE" and, if not done already, installs the key_services client.
  2. A user account 'JDOE' on host 'FOO' is configured in the Key Management system by a key management administrator.
  3. John Doe uploads (via the web) his or her public ssh key (in openssh format).
  4. John Doe requests (via the web) that his key be added to JDOE's authorized_keys file on FOO.
  5. A key management administrator approves the request, and the key_services client places the key in ~JDOE/.ssh/authorized_keys.

At this point, John Doe has key-based access to JDOE@FOO.  Simple enough?  But wait, there's more!  Now John Doe realizes that he also needs access to the group account named "operator" on host BAR.  Since his key is already in the key management system he has only to request that his key be added to operator@BAR, and voila (subject to administrator approval), he can now login with his key to both JDOE@FOO and operator@BAR.  And if Mr. Doe should leave STAR, then an administrator simply removes him from the system and his keys are removed from both hosts.

Slightly Deeper...

There are three things to keep track of here -- people (and their SSH keys of course), host (client) systems, and user accounts on those hosts:

People want access to specific user accounts at specific hosts.

So the system maintains a list of user accounts for each host system, and a list of people associated with each user account at each host.
(To be clear -- the system does not have any automatic user account detection mechanism at this time -- each desired "user account@host" association has to be added "by hand" by an administrator.)

This Key Management system, as seen by the users (and admins), consists simply of users' web browsers (with https for encryption) and some PHP code on a web server (which we'll call "starkeyw") which inserts uploaded keys and user requests (and administrator's commands) to a backend database (which could be on a different node from the web server if desired). 

Behind the scenes, each host that is participating in the system has a keyservices client installed that runs as a system service.  The keyservices_client periodically (at five minute intervals by default) interacts a different web server (serving different PHP code that we'll call starkeyd).  The backend database is consulted for the list of approved associations and the appropriate keys are downloaded by the client and added to the authorized_keys files accordingly.

In our case, our primary web server at www.star.bnl.gov hosts all the STAR Key Manager (SKM) services (starkeyw and starkeyd via Apache, and a MySQL database), but they could each be on separate servers if desired.

Perhaps a picture will help.  See below for a link to an image labelled "SKMS in pictures".

Deployment Status and Future Plans

We have begun using the Key Management system with several nodes and are seeking to add more (currently on a voluntary basis).  Only RHEL 3/4/5 and Scientific Linux 3/4/5 with i386 and x86_64 kernels have been tested, but there is no reason to believe that the client couldn't be built on other Linux distributions or even Solaris.  We do not anticipate "forcing" this tool onto any detector sub-systems during the 2007 RHIC run, but we do expect it (or something similar) to become mandatory before any future runs.  Please contact one of the admins (Wayne Betts, Jerome Lauret or Mike Dephillips) if you'd like to volunteer or have any questions.

User access is currently based on RCF Kerberos authentication, but may be extended to additional authentication methods (eg., BNL LDAP) if the need arises.

Client RPMs (for some configurations) and SRPM's are available, and some installation details are available here: 

http://www.star.bnl.gov/~dmitry/skd_setup/

An additional related project is the possible implementation of a STAR ssh gateway system (while disallowing direct login to any of our nodes online) - in effect acting much like the current ssh gateway systems role in the SDCC.  Though we have an intended gateway node online (stargw1.starp.bnl.gov, with a spare on hand as well), it's use is not currently required.

 

Anxious to get started? 

Here you go: https://www.star.bnl.gov/starkeyw/ 

You can use your RCF username and Kerberos password to enter.

When uploading keys, use your SSH public keys - they need to be in OpenSSH format. If not, please consult SSH Keys and login to the SDCC.