On stability of my SMD clustering

 There were valid questions raised about how well the output of my clustering method in the SMD matched in MC and in data.  Jan suggested a quickly-implementable alternative to my approach, and the attache dpdf shows the output of both of these for a number of SMD variables.

Old algorithm:  A range in smd strips is chosen by finding the center of the tower cluster above them, and counting a large number of strips (40) in each direction.  In this range, the most energetic strip is found, and starts the SMD cluster.  Strips on each side are added one at a time until the next strip is either below the minimum threshold (0.0002 GeV = 0.2 MeV) or more than 20% larger than the previous strip.  The sum of all these energies is the cluster energy in that plane.  The high-side energy is the higher of the sums of unclustered energies on either side of the cluster.

New algorithm:  A range in smd strips is found exactly as before.  In this range, a window, ten strips wide, is tried on every set of ten adjacent strips.  The window with the highest energy is the cluster int hat plane.  The cluster energy and high-side energy are defined as in the old algorithm.

 

In the attached .pdf file, the old algorithm's plot always precedes the new, and they are always in pairs.  The plots are made after fiducial (and tower energy) cuts have been made, so we are only seeing events that occur in the outer ~5 eta rings, and only in subsectors B,C, and D.  The black and red lines are single-photon and jet pythia MC, respectively.  The green and blue are real data with the l2gamma and l2jet triggers, respectively.  

 

Pages 1&2,3&4:  These plots show the total energy in the smd range (not just in the clustered SMD strips).  There are slight differences, which seem to come from a small number of additional events that are present in the new algorithm.  I believe these new events are ones that may have crashed the analysis just before completion in the old algorithm, or else represent some non-fatal bug that I inadvertently fixed between versions.  In any case, the overall shapes are the same.

Pages 5&6,7&8:  These plots show the high-side energy after the clustering process.  If our clustering algorithm is better, we would expect to see a tighter range for the photon pythia, since the prompt photons should leave only one cluster in the SMD.  This seems to be the case.  Note that the jet pythia and the two real datasets all gain an inflection point in the new algorithm (in the semilog plot, at least).  The new algorithm does seem to make the peaks of the various series line up better (they all move toward lower highside energy) than the old, but the general shapes beyond the peak were already very similar.

Pages 9&10,11&12:  These plots show the energy in the clustered SMD strips.  We expect this to be complementary to the previous plot, and indeed the new algorithm has more clustered energy on average.  The photon peak moves only slightly, but the other all shift significantly toward higher energy.  This is a mixed blessing, since any advantage it has for prompt photons is claerly occuring only in the data and not in the pythia photon sample.  There is still a significant disagreement between the locations of the various peaks.

Pages 13&14,15&16:  These plots show the high-side energy as a fraction of the clustered energy.  Since we would like our cluster to be the most energetic set of strips in the region, we would like this fraction to always be less than 1.0.  The old algorithm clearly doesn't behave like this, with a tail that reaches beyond 7.0.  In the new algorithm we see the photon peak tighten considerably (which comes from the decrease in high-side energy mentioned above), and all the series now have distinct shapes between 0.0 and 1.0, with tails dying off around 4.0.  The sharp cutoff at 1.0 helps to verify that in the data (as well at the jet MC) we are correctly identifying the most energetic cluster, and also that the unrelated energy in any given region of the SMD is fairly low on average (this corresponds to events in the tail above 1.0, where we might be seeing a second SMD cluster from an unrelated particle).

 

There are two pressing questions here:

Are the similarities between pythia and real data here 'close enough'?

What is the right width for the sliding window?  (And how does one decide that?)