Performance measurements comparisons, C-States kernel tweaks

Introduction

In order to lower the fan speed on the farm nodes (reducing the vibration level), Turbo boost and C-States were adjusted at kernel level and performance tests (as well as energy consumption estimates) were made.

The results below and intial testing were made by Costin Caramarcu (RCF):

  • Setup: Changes done in rack 22-9 (rcrs6114 – rcrs6145):
    • Turned Turbo Boost off
    • Turned C States on
    • Switched the power profile from MAXIMUM to Active Power Control
  • Intervention took ~55h with minimum down time and minimum intervention

The main observations of the initial tests:

  • The ambient temperature increased by ~2 °C
  • Fan speed dropped by ~5000RPMS
  • Power consumption in rack dropped by ~ 2000 WATT
  • Power consumption per node dropped by ~60 WATT
  • Turbo Boost ON and C States ON result in performance gain and higher power usage
  • Active power control produce minimal loss in performance with some power savings
  • Fan speed drops ~12-25% with Active power control
  • It's better to have Turbo Boost OFF and C States ON than the other way around

It was beleived, base don HEP SPEC, that the CPU performance was not significantly affected.

A rack of node was turned to the STAR experiment for testing. Other nodes were configured as follows:

  • rcrs6160: C-States ON, TurboBoost ON  - MAXIMUM
  • rcrs6134: C-States ON, TurboBoost OFF - Active Power Control
  • rcrs6133: C-States OFF, TurboBoost ON - Active Power Control

 

Spec used

SciMark2

One of the test suite I used are the Spec mark test suite created by NIST called SciMark2. SciMark2 is composed of multiple tests as follows:

  • Fast Fourier Transform - this test performs a one-dimensional forward transform of 4K complex numbers. The program starts with a bit-reversal portion (no flops) and the second performs the actual Nlog(N) computational steps. This exercises complex arithmetic, shuffling, non-constant memory references and trigonometric functions.
  • Successive Over-relax - or a Jacobi Successive Over-relaxation test. The test exercises typical access patterns in finite difference applications, for example, solving Laplace's equation in 2D with Drichlet boundary conditions. The algorithm is tailored to measure basic "grid averaging" memory patterns, where each A(i,j) is assigned an average weighting of its four nearest neighbors.
  • Monte-Carlo - is actually a Monte-Carlo integration test. It approximates the value of Pi by computing the integral of the quarter circle. The algorithm exercises random-number generators, synchronized function calls, and function inlining.
  • Sparse matmult - the Sparse Matrix multiplication test uses an unstructured sparse matrix stored in compressed-row format with a prescribed sparsity structure. A 1,000 x 1,000 sparse matrix with 5,000 nonzeros is used. This exercises indirection addressing and non-regular memory references.
  • Dense LU matrix - Dense LU matrix factorization computes the LU factorization of a dense 100x100 matrix using partial pivoting. Exercises linear algebra kernels (BLAS) and dense matrix operations.

ROOT Marks

ROOT marks are part of the ROOT test suite. This Mark is heavily biased toward ROOT operations and hence, may not represent accurately peformance of (for example) a pure Monte-Carlo program (SciMark MC test would then be more accurate). However, the root4star STAR framework would be close to performing as the ROOT marks (beyond its phase of intense calculations).

UnixBench

UNixBench is provide a basic indicator of the performance of a Unix-like system. Only a few tests were selected and namely:

  • int: integer operations
  • float: floating point operations
  • Dhrystone: This benchmark is used to measure and compare the performance of computers. The test focuses on string handling, as there are no floating point operations. It is heavily influenced by hardware and software design, compiler and linker options, code optimization, cache memory, wait states, and integer data types.
  • Whestone: This test measures the speed and efficiency of floating-point operations. This test contains several modules that are meant to represent a mix of operations typically performed in scientific applications. A wide variety of C functions including sin, cos, sqrt, exp, and log are used as well as integer and floating-point math operations, array accesses, conditional branches, and procedure calls. This test measure both integer and floating-point arithmetic.

 

Results