Recommended settings for key nodes

The following set of recommendations applies to mission critical systems in use in STAR. Mission critical systems are here defined as nodes hosting a service critical to the operation of STAR analysis, data production and/or run support. This definition would include services and nodes such as a Web server, a database servers, a gatekeepers under STAR control's and/or nodes hosting special services used in STAR's.

The rational behind a base set of rules is multiple:

  • Ease of re-deploy and backup of systems can be made if standard directories are assumed and known
  • Detection and tracing of changes is made easier if files and local scripts are kept in standard places (in fact, any creation of files and/or directories in different trees would then become suspicious).
  • Sharing the burden of management is made easier if all comply with the base rules and hence, knows the "what should be where" (minimize time waste to detect what was done and changed by a secondary administrator).
  • Minimize risks and create layers of responsibilities.

The recommendations are broken into the following sub-sections:

 

Access to root account and basic structure & rules

Responsible personnel, password versus key based access

The node registration should clearly display a primary and a secondary contact person for a mission critical node. Both names may have access to the root account using password access. There shall e no other password based access to the root account granted: only SSH key based access should be allowed and managed by the SSH Key Management system.

Each key should supply a key owner field (user@node or user) for easier visual identification of key provenance. SSH tracing should be enabled to allow session based login [Wayne, more precision needed here].

Administrators accessing a mission critical node are expected to

  • keep the primary administrators aware of changes they may perform and document changes
  • do NOT add components or alter setting of services under the control or responsibilities of dedicated service administrator without notice to the service administrator. For example, an upgrade of the Web server service (httpd) should be preceded with a notice to the Web master and administrator.
  • The courtesy rule above may be altered in emergency situations during which, changes should be documented and sent to the system's responsible personnel upon completion of the emergency procedure.

Directory locations

The root account home directory should be set to /root. The following directories should be used

  • /root/bin or /root/scripts - should be used for any additional scripts developed for either command line administrative or cron based tasks
  • /root/Software - should be used for leaving installation files and/or packages installed from source. The following sub-structure are base examples
    • rpm/ - a sub-directoy containing all downloaded rpm
    • underway/ - a directory where packages are being unpacked, compiled, tested and assembled. This sub-directory could contain package archives until such a time the package is Installed and operational.
    • Installed/ - a directory containing a copy of installed packages' archive (.tar.gz, .zip etc...)
    • perl/ - a sub-directory for perl related modules and code. This tree would have an additional level following the convention above. A sub-directory modules/ would exists containing the pre-compiled (and installed) perl modules in case a quick re-install is needed from source.
    • Self contained packages (such as perl) may follow a similar rule.
      • A sub-directory under Software/ must clearly and non-ambiguously indicate the package or component name
      • The structure underneath should reflect the structure outline above as for the perl/ example

      For example, if there is a need to confine all Drupal related component into its own source tree, simply create a /root/Software/Drupal and place underneath a structure such as rpm/ modules/ etc ...

  • /root/backup/ - a sub-directory where miscellaneous backups of the node's contents are placed.

Other directory rules

  • There should be no logs, reports, notes or other document under slash "/"
  • There shall be no other directories than under "/" than what is described below (/opt, /home)
  • /home/users should be used for holding local user's directories if any
    • number of local users should be kept to the strict minimum on mission critical nodes (ideally, none should be used)
  • /opt may be created from slash and contain optional packages (rt2, tivoli, vendor specific optional components are examples) as well as a holder of the cluster wide OPTSTAR code and packages if applies.
  • For te root account and if a script is used and needs a temporary space, /tmp should NOT be used for security reasons
    • Instead, /usr/local/tmp should be used (created)
    • Its ownership/protection should be strictly root:root and u:rwx (no access for others)
  • /usr/local/etc, /usr/local/share and other standard sub-directories should be used for home made tools in need of local configuration.
  • /etc/smrsh/ should contain Email pipe related scripts - those should NOT reside under /root/bin/ .

 

Process and services and access to non-root accounts

Access to non-root account should follow the following rule:

  • Services should be run under non-root local accounts. Use of secondary groups could be used to create privilege separation layer.
    For example:

    • mysql service should be started under the mysql account
    • A Web server would start under the starweb account
      • starweb would belong to the primary group rhstar
      • a secondary group webadmin could be assigned
    • A web administrator would log under a webman account belonging to the secondary group webadmin
      • ... and share files with the httpd process running under starweb with group access granularity.
      • All work and modifications would be done through webman, group access would not have w=write access to most files.
  • Only SSH key access should be used for process/service related account
  • Control of access should be managed by the SSH Key Management system. This would allow to keep track of who is able to access the servce account.

non-root account used for running services should assume the following rule of thumbs:

  • There will be one primary responsible owner of non-root service account (for example, only one Web master). Other requesting access would warrant an explicit approval by the primary owner of the account. Access to others would be revoked by the primary owner upon request.
  • Site admins would inform primary owner of non-root accounts of changes and/or upgrade needed and directly affecting their responsibility and service

 

History settings

The following scheme is suggested:

  • History listing should be set to at least 10k entries
  • History should display timestamps
  • Mechanism should be in place to allow identifying history by session (PID)

To satisfy those requirements, the following scripts should be placed in /etc/profile.d/

history.sh

#
# Set history format and location
#
HISTSIZE=10000
HISTTIMEFORMAT=1

if [ ! -d "$HOME/.history" ]; then
/bin/mkdir -p $HOME/.history
fi
HISTFILE=$HOME/.history/history.$$

history.csh

#
# Set history format and location
#
setenv HISTSIZE 10000
setenv HISTTIMEFORMAT 1

if ( ! -d "$HOME/.history" ) then
/bin/mkdir -p $HOME/.history
endif
setenv HISTFILE $HOME/.history/history.$$

Accounting and change detection mechanisms

Accounting

To help the findings of problems, accounting should be enabled using the standard Unix accounting service (psacct). This will allow the use of the lastcomm command as well as the last command (the later is possible without accounting).

Host integrity and change detection

At maximum one tool for detecting changes made to the system core components of the system should be installed (currently in use WatchFrog or Osiris). Area of control must

  • include /etc/, /usr/sbin/, /bin, /usr/bin/, /usr/local, /usr/local/bin/, /usr/local/etc/, /usr/sbin/, /boot/, /root/bin/ where path/ denotes the full content found recursively and path denote detection of changes within the specified directory level.
  • exclude frequently modified files so report remain meaningful - for example, /etc/ntp.drift would be excluded and so would be a backup log file
  • include the service components and content the node hosts - for example, a Web server would have the change detection monitor /var/www/ (as a side note, database content cannot be monitored as new entries would trigger the detection of a change).

Host integrity reports should be sent to the primary root account responsible personnel.

Backup and data safety

There should be at least one backup service running on mission critical systems. Key content should be backed up in a minimal yet complete manner. Examples include:

  • a Web server would see the /var/www/html tree back-up along key configuration files.
  • A node hosting Hypernews should save the Email content and archive
  • A database server content backup procedure may involve a two stage process whereas one step would perform a mysqldump and a backup process collect the dump snapshots.

backups should avoid saving generic system files such as the one on /bin or /usr/bin .