Spill-over Events

It doesn't look like I can remove the spillover events from the training samples without drastically changing their content.  Now I just have to figure out what to do with them to keep the distribution sensible.

I'd been hoping, in my previous post, that the ugliness in the KDE'd distributions could be attributed to small numbers of spillover from other partonic ptbins (and hence other weights), but a quick glance makes that seem not to be the case.  In fact, there is significant spillover between partonic and photon ptbins.  As an example, see the photon pt corresponding to partonic bin 11-15 (or more specifically, partonic pt bin weight 4.926E-10):

Here, the spread in the gaussian is much wider than the ptbin itself (11-15), emaning the spill-over is not just a few events, but rather a large number of them, all of which will have much larger weights than the events that came from the higher partonic pt bins.

From here, the next step is to look into whether these spillover events come from events where the partonic pt is very close to the next pt bin anyway, in which case maybe changing the weighting in the training sample to match the rest of the elements of the bin would not be so big a jump.  I'll have to write some new code to get at this information - the current ntuples don't have any partonic information.

Edit:
Of course, even if the spillover from one bin into the next is too large to be ignored, I imagine I can safely count out the spillover from one bin into the bin two away (though it looks like the effect is more pronounced at lower pt anyway).

The following image shows the weighted pt distribution in the 5<pt<7 bin, with the contribution of the 5-7 partonic bin highlighted in red.

The effect in the above histogram can be seen generically by looking at a 2D plot of ptbin vs weight (since weight is determined by the partonic pt bin) (the below is only for photon events, since background events have different weights):

The bins here that cause problems in the pt distribution are those in the upper right, small numbers of events that are much more highly weighted than the rest of the photon pt bin.  I'm proposing culling those events from the sample distributions so that smoothing will work better.