Performance Benchmarks

I ran a couple of TStopwatch tests on the Run 5 common trees.  Here are the specs:

Hardware:  Core Duo laptop, 2.16 Ghz

Trees:  805 runs, 26.2M events, 4.4 GB on disk

Languages:  CINT, Python, compliled C++

I also tested the impact of using a TEventList to select the ~11M JP1 and JP2 events needed to plot deta and dphi for pions and jets.  Here's a table of the results.  The times listed are CPU seconds and real seconds:

     Chain init  + TEventList generation   
    Process TEventList   
CINT156 / 247
1664 / 1909
Python
156 / 257
1255 / 1565
Compiled C++ 154 / 249
877 / 1209

I tried the Python code without using a TEventList.  The chain initialization dropped down to 50/70 seconds, but reading in all 26M events took me 1889/2183 seconds.  In the end the TEventList was definitely worth it, even though it took 3 minutes to construct one.

Conclusions:
  1. Use a TEventList.  My selection criteria weren't very restrictive (event fired JP1 or JP2), but I cut my processing time by > 30%.
  2. I had already compiled the dictionaries for the various classes and the reader in every case, but this small macro still got a strong performance boost from compilation.  I was surprised to see that the Python code was closer to compiled in performance than CINT.