FastOffline crashes on 2013-03-21
Just a documentation of the lifetimes of jobs that crashed in FastOffline on March 21, 2013.
Using the production logs, I took a quick look at the end time (vertical axis) of jobs that ended on March 21 vs. the start time (horizontal axis). Red/magenta indicates the job crashed, and blue/cyan is no crash. Files are only st_physics (blue/red) or st_physics_adc (cyan/magenta). Negative on the horizontal axis just means the hours before midnight, otherwise the numbers are the hour in eastern time (i.e. 6 = 6am, and 5.2 = 5:12am). The second plot is a zoom in on the jobs that started just after 5am but ended quickly.
Observations:
![](/STAR/files/userfiles/17/image/Computing/Production/FastOffline/20130321_FOcrashes.png)
![](/STAR/files/userfiles/17/image/Computing/Production/FastOffline/20130321_FOcrashes_zoom.png)
_____________________
Codes (there are probably better ways to do this, but what I was able to do quickly):
Obtain time stamps:
Make plots:
-Gene
Using the production logs, I took a quick look at the end time (vertical axis) of jobs that ended on March 21 vs. the start time (horizontal axis). Red/magenta indicates the job crashed, and blue/cyan is no crash. Files are only st_physics (blue/red) or st_physics_adc (cyan/magenta). Negative on the horizontal axis just means the hours before midnight, otherwise the numbers are the hour in eastern time (i.e. 6 = 6am, and 5.2 = 5:12am). The second plot is a zoom in on the jobs that started just after 5am but ended quickly.
Observations:
- All jobs which started before 2am and finished after ~5:12am crashed.
- Many jobs tried to start around 12:30, and they all crashed, but took a long time to do so (longer than the typical time for such non-adc or adc jobs).
- There are no log files (yet?) for jobs that started between ~12:30am and ~5:12am.
- Many jobs that started just after ~5:12am crashed, in two classes: quickly, or (again) longer than it typically took to run such jobs.
- Jobs that started after ~5:18am that have log files (i.e. have finished) have finished successfully. So far these are only adc jobs because they finish more quickly than the non-adc.
![](/STAR/files/userfiles/17/image/Computing/Production/FastOffline/20130321_FOcrashes.png)
![](/STAR/files/userfiles/17/image/Computing/Production/FastOffline/20130321_FOcrashes_zoom.png)
_____________________
Codes (there are probably better ways to do this, but what I was able to do quickly):
Obtain time stamps:
set ffs = `/bin/ls -l /star/rcf/prodlog/dev/log/daq/st_ph*.log.gz | grep "Mar 21" | colrm 1 50` touch res touch files foreach ff ($ffs) set brks = `zgrep -c -i break $ff ` echo $ff >> files set btime = `zgrep "Mar 2" $ff | head -1 | awk '{print $8}'` set etime = `/bin/ls -l $ff | awk '{print $8}'` set adc = `echo $ff | grep -c adc` echo $brks $adc $btime $etime >> res end sed -i 's/\:/ /g' res
Make plots:
TNtuple tt("tt","tt","break:adc:sh:sm:ss:eh:em"); tt.ReadFile("res"); tt.SetMarkerStyle(8); gStyle->SetGridColor(kGray); TCut late = "abs(sh+(sm/60.)-24*(sh>14)-5)<1&&eh+(em/60.)<6"; TCut break = "break>0"; TCut adc = "adc>0"; tt.SetMarkerColor(4); tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)"); tt.SetMarkerColor(7); tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",adc,"same"); tt.SetMarkerColor(2); tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",break&&!adc,"same"); tt.SetMarkerColor(6); tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",break&&adc,"same"); tt.SetMarkerColor(4); tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",late); tt.SetMarkerColor(7); tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",adc,"same"); tt.SetMarkerColor(2); tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",break&&!adc,"same"); tt.SetMarkerColor(7); tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",break&&adc,"same");
-Gene
Groups:
- genevb's blog
- Login or register to post comments