L2 Testbed

L2 Testbed

Test the feasibility of copying, pedestal subtracting, and gain correcting (and eventually energy sorting) the data before it gets passed to the L2 algorithms.  We would like to do this for the following reasons:

1. The calorimeter data goes through L2 to get into daq.  Data integrity requires us to work from a copy
2. All L2 algorithms will need to work with energy or ET rather than raw data.  Doing this once has its advantages.

The following code snippet is used to time the operations.  mCorrect, mPedestal, mZeroSupress and mGainFactor
are arrays of 4800 unsigned shorts.

  ulong tcorrect0,tcorrect1;
  rdtscl_macro(tcorrect0);
  for ( ushort rdo=0;rdo<mNumRdo;rdo++ )
    {
      // Get ADC and check for zero supression
      ushort raw=*(data+rdo);
      // Zero supress
      if ( raw < *(mZeroSupress+rdo) )
        {
          *(mCorrect+rdo)=0;
          continue;
        }
      // Subtract pedestal and gain correct
      *(mCorrect+rdo) = ((raw - *(mPedestal+rdo)) * (*(mGainFactor+rdo)));
    }
  rdtscl_macro(tcorrect1);

First I want to make sure the timing tests are reliable.   The machine which I am running on is

$ uname -a
Linux vu019624 2.6.9-55.0.2.EL #1 Tue Jun 26 10:46:04 CDT 2007 i686 i686 i386 GNU/Linux

$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 1
model name      : Intel(R) Pentium(R) 4 CPU 1.80GHz
stepping        : 2
cpu MHz         : 1795.158
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 3592.08

First figure shows the distribution of (tcorrect1-tcorrect0) for 4800 BEMC towers, run on 15k events from run 7100052.  On most events it takes ~40,000 cpu ticks (40 kTicks).  But there's a secondary peak.

The second figure shows the same test, this time letting the program send IO to stdout (as opposed to a log file).  There are no IO statements between the start and stop of the stopwatch in the loop.  Neverthe less there is significant increase in time
it takes to process many events.

 

Smaller, but still noticable changes were observed when the machine was loaded (i.e. by compiling StEvent) during the test.

 

Read a little about real-time OS's over the weekend. Any IO during the execution of the code, whether it occurs w/in the loop we're timing or not, can cause hardware interupts. Buffers will fill up and get dumped when the kernel wants to do it, which may be at a bad time. So I stripped out all IO after initialization and reran the tests.

Left and right panel shows timing tests for three conditions:

  1. Black: no I/O, xwindows running
  2. Blue: no I/O, terminal only
  3. Red: no "O", xwindows running, and I'm jiggling the mouse while typing the lyrics of a gilbert and sullivan tune on the keyboard.

Conclusions:

The secondary peak can be enhanced by
1. running an x-server on the machine
2. the user having any interaction with the computer (eg reading email, looking at QA plots, online chat discussing HMS Pinafore)
3. and, from the previous tests, output to either a text file or (much worse) the terminal.

I'm guessing these are not behaviors we want the algo to have.

The secondary peak integrates about 10-20% of the events.  We get a similar shape using
trigger sorted events (same proportion of primary and secondary peaks).