Run VII

Background

Facing a new paradigm of introducing CyberSecurity DOE regulations into our infrastructure, several action items were presented at the 2006 run critique meeting. The presentation is attached below as STAR-Critique-06.pdf (see below). The urgent and immediate items, some of which requiring deep restructuring, were:

  • We MUST establish an internal controlled perimeter to the unroutable network. This network will be accessible via a gatekeeper model. Vulnerable devices should be isolated to the internal network layer
  • All network and communication layers must be documented
  • Physical access to console were describe as part of the Shift procedure and shit alternance. Access to the online computing infrastructure MUST be controlled
  • All systems MUST be re mediated and brought up to the proper level of OS version and safety 
    • shall exceptions be needed, the device should have the proper control and monitoring
    • isolation in the private network of node we cannot upgrade due to operational-need is the other solution
  • OS flavor reduction – We propose to reduce the OS flavors to enhance and optimize support and maintenance
  • Group account access should be regulated via keys (ssh keys) and tight to indivdiuals (no a floating password without a clear understanding of who has it)
  • root access shall be restricted
    • A list of users having root access MUST exists at any point in time. In other words, only a few (documented) users should have root access privileges.
    • We must provide best effort to implement a configuration management strategy i.e. how changes occurs in our infrastructure shall follow a procedure and lead to an updated documentation.
  • Maintenance of computing equipment will be the responsibility of the S&C, DAQ and Slow Control groups as appropriate under general guidance of the S&C group.

 

The run preparation will be established within the following guidelines

  • General
    • Assess hardware replacement and cost (display, printer, UPS, switches, ...)
    • Assess sub-system needs for resources (disk space, bandwidth, database access, ...)
  • Networking 
    • Understand and reshape the current online Network spaghetti to a two layer model, with a gatekeeper model
    • Isolate vulnerable devices on a private network
    • Provide easer a routing or gatekeeper model ; reduce dual or tri-NIC connections
    • Patch all vulnerable machine and bring all equipment to appropriate level
  • Organizational needs – root access and password 
    • Establish a in-principle layer of responsibility and accountability
    • Determine root access and generic account access and usage
    • Provide infrastructure to manage keys as a function of nodes machine
    • Document procedure and equipment, establish principles for configuration management
    • Require for new equipment to comply with baseline control
      • New equipment shall not be brought randomly but integrated as part of the online infrastructure documentation
  • Software
    • Deploy a new Web server
    • Revisit all online common tools and needs – RunLog, ShiftLog, Web interfaces ...
    • Introduce technology and paradigm change for HTML-refresh poor-man's job approach
      • technique has spread and creates heavy load
    • Review Pplots needs and coverage
    • Introduce Scaler monitoring tool
    • Revisit Ganglia monitoring with special care on broadcast/multi-cast
  • Establish a first testbed of database consolidation for high-luminosity regime 
    • With help from Slow Control – IRMIS project

Understanding our online Network

The following table is a first cut to understanding the inter-connections between online hardware.

  • ch2connect.xls shows the NFS mounts between machines
  • Network-top level.pdf is a rough first cut of the network schematic

Patching and OS version-ing

  • July 28th 2006 
    • The matrix Old_Linux.pdf displays the list of nodes requiring attention
    • Two Windows machines (Alexei's Lebedev responsibility) require immediate attention.

 

Related meeting

  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node

 

New online web server (dean.star.bnl.gov)

New web server notes for content providers and users


There is a new web server (dean.star.bnl.gov) online to replace ch2linux.star.bnl.gov.  The "online.star.bnl.gov" alias was switched to dean.star.bnl.gov at about 2pm on Tuesday, Feb. 29, 2007.  There is perhaps as much as 24 hours of DNS propagation time for the alias change to make it around the world, during which time, there could be confusion about which system (dean or ch2linux) is actually being accessed.

We plan to keep ch2linux online for 1-2 weeks to help in debugging, and as a fallback for broken content until it is fixed.

A gotcha to watch out for is the hard-coding of the "ch2linux" name in any links.  Use of the "online.star.bnl.gov" alias is generally preferable.

For those of you with individual accounts on ch2linux, the accounts have been duplicated on the new server (if you have an account, you can immediately use the key management system ( https://www.star.bnl.gov/starkeyw ) to install openssh public keys if desired on both the current (ch2linux) and new (dean) web servers).

Some hints and suggestions for content maintainers:


Some of the configuration changes between ch2linux and dean (particularly to php) may require modifcations to existing content to work properly on the new server.  With php, the change that seems most likely to bite us is "register_globals = Off".  On ch2linux, this is set to On, allowing php automatic access to variables passed in POST or GET requests.  Here is a quick primer on the effect of turning this off, taken from the php.ini file:

;     Global variables are no longer registered for input data (POST, GET, cookies,
;     environment and other server variables).  Instead of using $foo,
;     you can use $_REQUEST["foo"] (includes any variable that arrives through the
;     request, namely, POST, GET and cookie variables), or use one of the specific
;     $_GET["foo"], $_POST["foo"], $_COOKIE["foo"] or $_FILES["foo"], depending
;     on where the input originates.  Also, you can look at the
;     import_request_variables() function.
;     Note that register_globals is going to be depracated (i.e., turned off by
;     default) in the next version of PHP, because it often leads to security bugs.
;     Read http://php.net/manual/en/security.registerglobals.php for further
;     information.

A second php issue is that we'd like to keep the default setting of "display_errors = Off" in php, as a security precaution.  However, since having it turned on is often useful for debugging, we can leave it on for a week or two in the initial stages, then turn it back to off.  A common issue with these php settings, is that you might notice mostly harmless "Notice" messages from php - commonly about uninitialized variables -- we all know to always initialize our variables, right?

If your php code (or perl, or whatever) is encountering file access errors, the problem may be stemming from SELinux.  I have fixed several file contexts and the local SE policy to fix problems with the RICH Scaler plots, the RunLog Browser and tomcat.  Unfortunately, content owners may have a difficult time diagnosing such problems.  One way is to login to the server, "cause" the error and then look at the output of "dmesg |tail -n 30" (30, 40, whatever it takes) and look for an audit messeages with "avc:  denied" lines that might be related to your content.  If you see such errors, inform Wayne Betts who can look into it further.  As a quick test, we can temporarily disable SELinux to see if it clears up any problems.



Another common issue has been database access controls.  Many of our databases have fairly granular access controls, and dean may not be configured for access to everything it needs.  If that is the suspected source of any problems, Mike DePhillips can look into it.

STAR's SSH Public Key Management System

SSH Public Key Management Tool

Overview

The main from end Web interface begins from https://www.star.bnl.gov/starkeyw/  (see step by step instructions in the next section). This SSH public key management system has been designed in STAR to address the following requirements:

  • Use of two-factor authentication for remote logins
  • Allow association of remote user as a one-to-many association: a remote user may associate his/her keys to a local domain user account onto one or more local so-called  "group" account which are not tight to one individual (such account is for example an "operator" account or even the "root" account)
  • Provide a simple Web front end to users to request, view and manage their own key associations (hence easily managing access to a domain)
  • Allow a set of system administrators to easily manage key association for a domain (globally disabling users having left STAR for example)
  • Using SSH key fingerprint, allow to identify which user is logging in to which accounts (a security requirement)
  • Be able to provide upon demand a list of who had access to which account on what machine and when in one click (historical records, easily access to access grant lists)

Such system was developed for STAR and named the "SSH Key Management system" aka SKM. More information can be found in this publication. A side benefit for users also can be seen in the reduction in the number of passwords to remember and type.

Notes

  • In purpose, this system is similar to the RCF's key management system (full instructions here), but is more powerful because of its flexibility in the association of hosts (client systems), user accounts on those clients, and self-service key installation requests.
  • The STAR SKM system has been initially used for managing the online computer access and has expanded since to manage all nodes in STAR running a specialized service (offline database, web server and so on), streamlining the security model by making it consistent across nodes.
  • The system was designed to be as secured as possible (central repository of keys, pull information only from clients and NO push to avoid multiple-point-of-corruption). In other words, each clients have a light weight daemon polling and pulling the SSH key association information our of a central DB for itself and handling installing keys. Clients are not allowed to manage keys (the Web interface only does). The client daemon creates no load.

 

Where do we start? What is a typical use example?

You should use your RCF username and Kerberos password (credentials) to enter this interface.

Here is a typical scenario of the system usage: 

  1. A sysadmin of a machine named FOO creates a user account named "JDOE" and, if not done already, installs the key_services client.
  2. A user account 'JDOE' on host 'FOO' is configured in the Key Management system by a key management administrator*.
  3. John Doe uploads (via the web) his or her public ssh key (in openssh format).
  4. John Doe requests (via the web) that his key be added to JDOE's authorized_keys file on FOO.
  5. A key management administrator approves the request, and the key_services client places the key in ~JDOE/.ssh/authorized_keys.

* Current admins are Wayne Betts and Jerome Lauret.

At this point, John Doe has key-based access to JDOE@FOO.  Simple enough?  But wait, there's more!  Now John Doe realizes that he also needs access to the group account named "operator" on host BAR.  Since his key is already in the key management system he has only to request that his key be added to operator@BAR, and voila (subject to administrator approval), he can now login with his key to both JDOE@FOO and operator@BAR.  And if Mr. Doe should leave STAR, then an administrator simply removes (disables) him from the system and his keys are removed from both hosts.

 

More details

Slightly Deeper...

There are three things to keep track of -- people (and their SSH keys of course), host (client) systems, and user accounts on those hosts:

People want access to specific user accounts at specific hosts.

The system maintains a list of user accounts for each host system, and a list of people associated with each user account at each host . To be clear: while the Web interface allows any user to log in, the system does not have any automatic user account detection mechanism at this time, each  "{user-}account" has to be added by hand by an administrator for that account to be listed as a possible association for node FOO or BAR.

Behind the scenes, each host that is participating in the system has a keyservices client installed that runs as a system service.  The keyservices_client periodically (at five minute intervals by default) polls a central service for its information.  In other words, the back-end database is consulted for the list of approved associations and the appropriate keys are downloaded by the client and added to the related account's authorized_keys files accordingly.

In our case, orion.star.bnl.gov hosts all the server services (starkeyw and starkeyd via Apache, and a MySQL database), but they could all be on separate servers if desired.

Deployment Status and Future Plans

Only RHEL and Scientific Linux with i386 and x86_64 kernels have been tested, but there is no reason to believe that the client couldn't be built on other Linux distributions or Solaris. Please contact one of the admins (Wayne Betts, Jerome Lauret) if you'd like to volunteer and add your sub-system node to SRKM or if you have any questions.

User access to the Web interface is currently based on the RCF Kerberos authentication. You will hence need a valid BNL/RCF account to access the Web interface and manage key associations for your account.

In 2012, SKM was extended to implement volatile key association (lifetime and expiration may be set to each key associations). This feature allows granting access to a given user to a privileged account on a temporary debugging-need basis (as one example). This feature has also been seen as in use for group account of operational nature having rotating and changing teams at each new runs (in such case, the new list of who is associated to such account need to be re-assessed yearly and the associations would be set for example to expire after a year's period). This is a feature - the default has no expiration.