- jeromel's home page
- Posts
- 2020
- 2019
- 2018
- 2017
- 2016
- 2015
- December (1)
- November (1)
- October (2)
- September (1)
- July (2)
- June (1)
- March (3)
- February (1)
- January (1)
- 2014
- 2013
- 2012
- 2011
- 2010
- December (2)
- November (1)
- October (4)
- August (3)
- July (3)
- June (2)
- May (1)
- April (4)
- March (1)
- February (1)
- January (2)
- 2009
- December (3)
- October (1)
- September (1)
- July (1)
- June (1)
- April (1)
- March (4)
- February (6)
- January (1)
- 2008
- My blog
- Post new blog entry
- All blogs
Investigation CPUtime > RunTime
Problem
Lidia reported on April 1st that some jobs seem to have CPUTime > RunTime, throwing off balance our efficiency calculation.
Upon check, all jobs displaying this characteristsics did run on rcrs133.rcf.bnl.gov (only one node). An example provided follows
Job name | CPU/events | RealTime/events | Time run |
AuAu39_production_ReversedFullField_P10ik_st_mtd_11110011_raw_4170001 | 22.34 | 14.17 | 2011-04-02 02:50:00 |
AuAu39_production_ReversedFullField_P10ik_st_physics_11110028_raw_4040001 | 11.06 | 7.31 | 2011-04-03 12:21:00 |
Measurements and notes
Benchmark
The below table shows the bench mark measurements for the reference node (rcrs133) and a normally behaving ode (rcrs134). Both are Intel(R) Xeon(R) CPU X5560 @ 2.80 GHz. Bench mark were based on a 10 passes over a suite similar than gcc 4.3.2 compared to gcc 4.5. and averaged below
Composite | FFT | SOR | MC | SparseMult | LU | ||
scimark.rcrs6134.rcf.bnl.gov | Average | 692.63 | 632.01 | 778.91 | 215.98 | 746.08 | 1090.19 |
scimark.rcrs6133.rcf.bnl.gov | Average | 457.73 | 422.05 | 506.47 | 152.02 | 486.61 | 721.52 |
Differences | 33.91% | 33.22% | 34.98% | 29.61% | 34.78% | 33.82% |
The defective node is between 33-34% less efficient than a normal node but this may be due to a clock problem (if CPUTime is systematically greater than RunTime, marks will be skewed toward lower values).
In the above, our mark program was NOT muli-threaded.
Note on the executable
To the extent our application would contain ANY multi-threading, it is possible to have CPUTime (usually including the sum of all times used across all CPUs) greater than RealTime. To the extreme case where the workload spread evenly on two CPus for example, CPUTime wuld be exactly RealTime x 2.
ldd /afs/rhic.bnl.gov/star/packages/SL10k/.sl53_gcc432/bin/root4star linux-gate.so.1 => (0xffffe000) libCore.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libCore.so (0xf776f000) libCint.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libCint.so (0xf6fca000) libRIO.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libRIO.so (0xf6ddf000) libNet.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libNet.so (0xf6ccb000) libHist.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libHist.so (0xf68c6000) libGraf.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libGraf.so (0xf66ce000) libGraf3d.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libGraf3d.so (0xf65bb000) libGpad.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libGpad.so (0xf64e1000) libTree.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libTree.so (0xf62e1000) libRint.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libRint.so (0xf62b2000) libPostscript.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libPostscript.so (0xf6262000) libMatrix.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libMatrix.so (0xf6020000) libPhysics.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libPhysics.so (0xf5f93000) libMathCore.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libMathCore.so (0xf5dc1000) libThread.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libThread.so (0xf5d76000) libGeom.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libGeom.so (0xf5a35000) libTable.so => /afs/rhic.bnl.gov/star/ROOT/5.22.00/.sl53_gcc432/rootdeb/lib/libTable.so (0xf58ef000) libX11.so.6 => /usr/lib/libX11.so.6 (0xf57d6000) libnsl.so.1 => /lib/libnsl.so.1 (0xf57bd000) libcrypt.so.1 => /lib/libcrypt.so.1 (0xf578a000) libdl.so.2 => /lib/libdl.so.2 (0xf5785000) libgfortran.so.3 => /usr/lib/libgfortran.so.3 (0xf56cb000) libXt.so.6 => /usr/lib/libXt.so.6 (0xf5674000) libXpm.so.4 => /usr/lib/libXpm.so.4 (0xf5663000) librt.so.1 => /lib/librt.so.1 (0xf565a000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xf5570000) libm.so.6 => /lib/libm.so.6 (0xf5547000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf553b000) libc.so.6 => /lib/libc.so.6 (0xf53e2000) libpthread.so.0 => /lib/libpthread.so.0 (0xf53c9000) libXau.so.6 => /usr/lib/libXau.so.6 (0xf53c5000) libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0xf53bf000) /lib/ld-linux.so.2 (0xf7f01000) libSM.so.6 => /usr/lib/libSM.so.6 (0xf53b6000) libICE.so.6 => /usr/lib/libICE.so.6 (0xf539c000)
The way times are calculated could be found in this article for example (where an explaination of this issue also arises). See also the CPU Time article in Wikipedia (while not accademic, its simple and comprehensive explaination provides a good place to start).
Clock timing
Using the time system function on a sample executable (spending time calculating prime numbers greater than an initial value / no IO involved) BUT single thread application, I observed the following:
- rcrs6134 (reference): always returns %tages with 100%
- rcrs6133 returned twice in 10 tests %tages > 100% (once 200.0% and once 198.4%)
- I could create the > 100% results at least half of the time if, before a a 'time', I would use 'sync' (stat over 10 tries only / perhaps coincidence)
A kernel and/or IO problem has to exist on this node (results > 100% are not possible for single threaded processes).
Sample results from rcrs6133:
0.132u 0.002s 0:00.13 100.0% 0+0k 0+0io 0pf+0w 27.983u 0.045s 0:14.01 200.0% 0+0k 0+0io 0pf+0w 1.877u 0.002s 0:01.91 97.9% 0+0k 0+0io 0pf+0w 19.045u 0.005s 0:19.07 99.8% 0+0k 0+0io 0pf+0w 20.267u 0.000s 0:20.28 99.9% 0+0k 0+0io 0pf+0w 20.425u 0.000s 0:20.45 99.8% 0+0k 0+0io 0pf+0w 20.401u 0.000s 0:20.41 99.9% 0+0k 0+0io 0pf+0w etc ... Using "sync" in between 40.580u 0.028s 0:20.46 198.4% 0+0k 0+0io 0pf+0w 19.884u 0.000s 0:19.90 99.8% 0+0k 0+0io 0pf+0w 19.002u 0.000s 0:19.01 99.9% 0+0k 0+0io 0pf+0w 36.679u 0.929s 0:20.10 187.0% 0+0k 0+0io 0pf+0w 19.709u 0.000s 0:19.71 99.9% 0+0k 0+0io 0pf+0w 39.912u 0.090s 0:20.05 199.4% 0+0k 0+0io 0pf+0w etc ...
- jeromel's blog
- Login or register to post comments