Missing Statistics in 2009 200GeV Jet Trees

This page details my investigation into why, for a majority of runs, there is a large discrepancy between the number of events of a certain trigger as seen in the 2009 200GeV jet trees as opposed to what we would expect from the run log.

 

As part of my QA of the 2009 200GeV jet trees I created plots showing the ratio of the number of specific triggers counted from the jet trees vs the number of triggers that were recorded in the cdev database (and displayed on the run log pages) on a run by run basis. I saw that many runs were missing a singnificant number of events.

 

Figure 1: The bottom panel of this plot shows the ratio of number of JP1 triggers gotten from the jet trees to the number of JP1 triggers expected from cdev for ~110 first priority production2009_200Gev_Single runs.

 

A file which gives the mapping between Run Index and run number can be found here.

 

A quick word on how the jet trees are produced. Each run has a number of MuDst files associated with it, can vary from ~20 to ~50 depending on how many events are in the run. The submission script which sends the trees to the farm groups these MuDst files into a series of jobs usually containing 5 or 6 individual MuDst files. These jobs are then submitted to the medium queue, which has a time limit of 5 hours.

When I investigated the log files from the runs with the missing statistics I found that for each run, one or more jobs had failed. Looking closer at the failed jobs, I found that they often accounted for a large fraction of the total number of events in the run. For example, the files for run 10125068 were split into 6 jobs, the one job that failed contained 123565 events and the 5 jobs that completed only contained 80981 events. It seems that the schedular likes to group files with large event numbers into the same jobs, and it is those jobs that fail.

The fact that it was the jobs with a large number of events which were failing led me to believe that these jobs were bumping into the time limit on the short queue. To test this, I ran a small jet tree production and forced all jobs into the long queue.

 

Figure 2: The bottom panel of this plot shows the ratio of JP1 trigger seen in the jet trees to the number of JP1 triggers we would expect from cdev for the first 20 runs shown in figure 1. This plot was created using the jet trees which were run in the long queue.


 

As you can see, creating the trees using the long queue eliminates the large discrepencies between seen and expected triggers which were present in the trees created using the medium queue.