This tutorial is very specific to the cases where you have a process running (batch typically) and one of two conditions below is happening and you do not know why (but eventually, would like too :-) ):
Slow programs are still programs in execution mode. The best way to see what is happening is to use "strace".
[An example will be provided soon]
If your program is really stuck, you will not be able to strace it as explained in the previous section. In fact, here is an example of what you would see under such conditions:
% ps -ef | grep root4star starreco 11209 10955 97 May20 ? 1-07:30:59 root4star -b -q bfc.C(100000,"DbV20080512 ry2007 in tpc_daq tpc fcf svt_daq SvtD Physics Cdst Kalman l0 tags Tree evout l3onl emcDY2 fpd ftpc trgd ZDCvtx -dstout CMuDst hitfilt Corr4 OSpaceZ2 OGridLeak3D","st_upc_8097121_raw_1150008.daq") starreco 9178 8953 0 19:39 pts/0 00:00:00 grep root4star % strace -p 11209 Process 11209 attached - interrupt to quit ...
strace would not show any unrolling call stack but simply wait there forever. You know you have a "stuck" process for sure and need to revert to the use of gdb. For the same process, this is what to do
% gdb GNU gdb Red Hat Linux (6.3.0.0-1.143.el4rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu". (gdb) attach 11209 Attaching to process 11209 Reading symbols from /afs/rhic.bnl.gov/star/packages/release10/SL08b/.sl44_gcc346/OBJ/asps/rexe/root4star.3...done. Using host libthread_db library "/lib/tls/libthread_db.so.1". Reading symbols from /afs/rhic.bnl.gov/star/ROOT/20/5.12.00/.sl44_gcc346/root/lib/libCore.so...done. Loaded symbols for /afs/rhic.bnl.gov/star/ROOT/5.12.00/.sl44_gcc346/root/lib/libCore.so ...
there will be a lot of shared library loaded and finally
Reading symbols from /lib/libnss_files.so.2...done. Loaded symbols for /lib/libnss_files.so.2 0x43f00fa1 in StarMagField::Search (this=0x54e16008, N=37, Xarray=0x54ec64c4, x=-nan(0x400000), low=@0x43f0a4c4) at .sl44_gcc346/OBJ/StRoot/StarMagField/StarMagField.cxx:1084 1084 if ( (Int_t)( x >= Xarray[middle] ) == ascend ) #0 0x43f00fa1 in StarMagField::Search (this=0x54e16008, N=37, Xarray=0x54ec64c4, x=-nan(0x400000), low=@0x43f0a4c4) at .sl44_gcc346/OBJ/StRoot/StarMagField/StarMagField.cxx:1084 #1 0x43f0071b in StarMagField::Interpolate3DBfield (this=0x54e16008, r=-nan(0x400000), z=-2.99321151e+33, phi=-nan(0x400000), Br_value=@0x45, Bz_value=@0x45, Bphi_value=@0x45) at .sl44_gcc346/OBJ/StRoot/StarMagField/StarMagField.cxx:844 #2 0x43efeafb in StarMagField::B3DField (this=0x54e16008, x=0xbffc8e70, B=0xbffc8e60) at .sl44_gcc346/OBJ/StRoot/StarMagField/StarMagField.cxx:482 #3 0x4d00c203 in StFtpcTrack::MomentumFit (this=0x1b5d72d8, vertex=0xbffc9610) at .sl44_gcc346/OBJ/StRoot/StFtpcTrackMaker/StFtpcTrackingParams.hh:344 #4 0x4d00b391 in StFtpcTrack::Fit (this=0x1b5d72d8, vertex=0xbffc9610, max_Dca=100, primary_fit=true) at .sl44_gcc346/OBJ/StRoot/StFtpcTrackMaker/StFtpcTrack.cc:575 #5 0x4d0257cc in StFtpcTracker::Fit (this=0xbffc93f0, primary_fit=Variable "primary_fit" is not available. ) at .sl44_gcc346/OBJ/StRoot/StFtpcTrackMaker/StFtpcTracker.cc:751 #6 0x4d01ee7e in StFtpcTrackMaker::Make (this=0xc9cf538) at .sl44_gcc346/OBJ/StRoot/StFtpcTrackMaker/StFtpcTracker.hh:64 #7 0x4285b0c0 in StMaker::Make (this=0xc9c58c0) at .sl44_gcc346/OBJ/StRoot/StChain/StMaker.cxx:965 #8 0x4285b0c0 in StMaker::Make (this=0xaf3e4c0) at .sl44_gcc346/OBJ/StRoot/StChain/StMaker.cxx:965 #9 0x4285426b in StChain::Make (this=0xaf3e4c0) at .sl44_gcc346/OBJ/StRoot/StChain/StChain.cxx:105 #10 0x4296022d in StBFChain::Make (this=0xaf3e4c0) at .sl44_gcc346/OBJ/StRoot/StBFChain/StBFChain.h:78 #11 0x4285522d in StMaker::IMake (this=0xaf3e4c0, number=275) at .sl44_gcc346/OBJ/StRoot/StChain/StMaker.h:110 #12 0x4295fd89 in StBFChain::Make (this=0xaf3e4c0, number=275) at .sl44_gcc346/include/StChain.h:49 #13 0x428545b0 in StChain::EventLoop (this=0xaf3e4c0, jBeg=1, jEnd=100000, outMk=0x0) at .sl44_gcc346/OBJ/StRoot/StChain/StChain.cxx:165 #14 0x4286d4a2 in G__StChain_Cint_541_9_0 () at /afs/rhic.bnl.gov/star/ROOT/5.12.00/.sl44_gcc346/root/include/TString.h:248 #15 0x40855c0c in G__ExceptionWrapper () from /afs/rhic.bnl.gov/star/ROOT/5.12.00/.sl44_gcc346/root/lib/libCint.so #16 0x40911cda in G__call_cppfunc () from /afs/rhic.bnl.gov/star/ROOT/5.12.00/.sl44_gcc346/root/lib/libCint.so #17 0x408ffd40 in G__interpret_func () from /afs/rhic.bnl.gov/star/ROOT/5.12.00/.sl44_gcc346/root/lib/libCint.so #18 0x408ebb95 in G__getfunction () from /afs/rhic.bnl.gov/star/ROOT/5.12.00/.sl44_gcc346/root/lib/libCint.so ---Type <return> to continue, or q <return> to quit---q Quit (gdb)
Withing this stack, you see that the reason for this particular process to be stuck is
0x43f00fa1 in StarMagField::Search (this=0x54e16008, N=37, Xarray=0x54ec64c4, x=-nan(0x400000), low=@0x43f0a4c4) at .sl44_gcc346/OBJ/StRoot/StarMagField/StarMagField.cxx:1084 1084 if ( (Int_t)( x >= Xarray[middle] ) == ascend )
Your bug reporting could begin OR if this is your code, now starts the debugging process, editing code, setting breakpoints and stepping in the call stack inspecting the variables and why/when they are assigned to NaN and so on. Your logic may simply be obviously flaw so your first step is definitely editing the code and understanding its logic.