Compare cpu per event for gcc 4.8.2, 4.9.2, 5.2.1 and icc 17.0.0 compilers for 2013 pp510 W sample on HLT farm

After merging online and offline CA codes and switching to Vc version 1.2 
I have made timing measurement of relative perfomance  of gcc 4.8.2, 4.9.2, 5.2.1 and icc 17.0.0 compilers 
and summarize them in the table below.

  gcc 4.8.2

 gcc 4.9.2 gcc 5.2.1 icc 17.0.0
  -m32 -m64 -m32 -m64 -m32 -m64


  -g -g -O2 -g -g -O2 -g -g -O2 -g -g -O2 -g -g -O2 -g -g -O2 -g -g -O2
CPU(sec) 37.70 16.61 34.18 13.18 39.30 18.46 32.91 15.10 40.16 19.02 33.12 13.85 17.84 17.89
Ratio 2.27 1.00 2.06 0.79 2.37 1.11 1.98 0.91 2.42 1.14 1.99 0.83 1.07 1.08

As the reference to compare performace I took optimized version of gcc 4.8.2 for ia32 architecture (with -m32 and -g -O2 options) presently used for STAR production.

The comments and conclusions from the above table:

1. The default optimization for icc is -O2 (instead -O0 for gcc).  This is the reason  why -g and -g -O2 give the same result.
2. -m64 option with icc does not work because it requires rebuilding of cernlib.
3. gcc 5.2.1 from the standard Software Collections does not support ia32 architecture (-m32).
4. A CPU gain ~20% has been observed for optimized verion with x8664 (intel64, -m64) over ia32 (-m32) architectures. 
5. There no cpu gain with icc with respect to gcc 4.9.2 (-m32) and there is a degradate ~8% with respect to gcc 4.8.2 (-m32). 
6. It does not look like that we can gain anything essential from icc.
I think icc thread can be considered as non profitable in the nearest future.