Production update

Speaker : Lidia Didenko ( BNL )


Talk time : 12:35, Duration : 00:10

         AuAu 200GeV run 2011 data production status

  Total number of events with TPC detector to be produced is 1.1B,
  915M events are produced, <20%  to be produced.
  Expected to finish in ~2 weeks.
  Production inludes next stream data:
  st_physics, st_gamma, st_ht, st_hlt, st_mtd, st_ftp, st_upc, st_zerobias.
  Statistics produced for stream data is on the page  

 Production is running fairly stable.

- Status of HPSS files transferring for the last month you can find on the plot
  in percentage to number of jobs finished per hour:
HPSS status  
One pick of failure at Jan 24 due to HPSS management network failed
  and connectivity to tape's library was lost.

- RealRime/CPU usage is optimized, see distribution by days for the
 last month on this plot  
Minor enhance on Jan 24, related to network problem.
  You can see on the plot which nodes contributed to this bump.
It help to identify nodes which have network or AFS slowness problem.

  Submission is going by run numbers, includes all stream jobs
  for submitted run numbers.
  Distribution of stream jobs finished per hour for the last month
  was fairly stable, you can see it on the plot

- Number of crashed jobs due to code problem is negligable < 0.1%,
  most of them (due to AFS or netwrok problem)  will be rerun at the end.

  Now we have JobStatus table on the WEB interface  
which is monitoring  jobs status.

  Disk space is an issue. Current AuAu 200GeV production takes 430TB
  and needs ~80TB more to complete. With current NFS and DD space
  we can survive by weekend. So we have to start to remove produced
  data from NFS to write the rest of other and catalogged them.
  Data will be restored on DD when more space is available.