FastOffline crashes on 2013-03-21

 Just a documentation of the lifetimes of jobs that crashed in FastOffline on March 21, 2013.

Using the production logs, I took a quick look at the end time (vertical axis) of jobs that ended on March 21 vs. the start time (horizontal axis). Red/magenta indicates the job crashed, and blue/cyan is no crash. Files are only st_physics (blue/red) or st_physics_adc (cyan/magenta). Negative on the horizontal axis just means the hours before midnight, otherwise the numbers are the hour in eastern time (i.e. 6 = 6am, and 5.2 = 5:12am). The second plot is a zoom in on the jobs that started just after 5am but ended quickly.

Observations:
  • All jobs which started before 2am and finished after ~5:12am crashed.
  • Many jobs tried to start around 12:30, and they all crashed, but took a long time to do so (longer than the typical time for such non-adc or adc jobs).
  • There are no log files (yet?) for jobs that started between ~12:30am and ~5:12am.
  • Many jobs that started just after ~5:12am crashed, in two classes: quickly, or (again) longer than it typically took to run such jobs.
  • Jobs that started after ~5:18am that have log files (i.e. have finished) have finished successfully. So far these are only adc jobs because they finish more quickly than the non-adc.





_____________________

Codes (there are probably better ways to do this, but what I was able to do quickly):

Obtain time stamps:
set ffs = `/bin/ls -l /star/rcf/prodlog/dev/log/daq/st_ph*.log.gz | grep "Mar 21" | colrm 1 50`

touch res
touch files
foreach ff ($ffs)
  set brks = `zgrep -c -i break $ff `
  echo $ff >> files
  set btime = `zgrep "Mar 2" $ff | head -1 | awk '{print $8}'`
  set etime = `/bin/ls -l $ff | awk '{print $8}'`
  set adc = `echo $ff | grep -c adc`
  echo $brks $adc $btime $etime >> res
end
sed -i 's/\:/ /g' res

Make plots:
TNtuple tt("tt","tt","break:adc:sh:sm:ss:eh:em");
tt.ReadFile("res");
tt.SetMarkerStyle(8);
gStyle->SetGridColor(kGray);
TCut late = "abs(sh+(sm/60.)-24*(sh>14)-5)<1&&eh+(em/60.)<6";
TCut break = "break>0";
TCut adc = "adc>0";

tt.SetMarkerColor(4);
tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)");
tt.SetMarkerColor(7);
tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",adc,"same");
tt.SetMarkerColor(2);
tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",break&&!adc,"same");
tt.SetMarkerColor(6);
tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",break&&adc,"same");

tt.SetMarkerColor(4);
tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",late);
tt.SetMarkerColor(7);
tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",adc,"same");
tt.SetMarkerColor(2);
tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",break&&!adc,"same");
tt.SetMarkerColor(7);
tt.Draw("eh+(em/60.):sh+(sm/60.)-24*(sh>13)",break&&adc,"same");

-Gene