Looking at the Gaussian Widths:

We now have the fitting algorithms working for both the LDA and the 2-D approach.  The results are plotted below, marking actual values in black, the 2D values in red, and the LDA values in blue.  The two missing red points correspond to zeros, which don't appear on the log plot.

The method used, as mentioned before, involves dividing the data into two portions, one dedicated as a known sample, the other dedicated as unknown.  The known samples are always used to generate expected distributions to which the unknown sample is always fitted.  Because of the finite statistics, we smear the known distributions with a gaussian KDE (more on this later).

Note, also that although the plot above shows the LDA with the right sort of pt dependence (although off by several orders of magnitude), this is a very close to the total number of events in each pt bin (signal and background).

The error bars on the LDA datapoints obviously dwarf the values themselves, which can be attributed to the smearing used.  As can be seen below, though the green curve has some structure near the peak, it's dominated by a wide gaussian that goes much farther than the other distributions.


(black is the unknown distirbution, blue is the background fit, green is the photon fit, and red is the summed fit)

We use a KDE to smooth out the finite statistics in the sample distributions.  In order to better contain the spread of the gaussians, the widths are designed to be smaller on mroe significant peaks, and wider on ones that appear in the tails of the distribution.  We base this width on the total number of entries in each bin (and not the total weight in that bin, since entries can have vastly different weights).  Because of this approach, bins with single, highly-weighted events will dominate the smoothed histogram.

I'm looking at several solutions:
- The number of bins in the histograms can be reduced.  This will eliminate some of the finer features of the peak regions, but will control the gaussians better by reducing the likelihood of a single event falling in a bin.
- I can change the KDE used to be asymmetric, like the actual distribution
- I can change the fit to only cover the region up to ~.85, where the actual distribution cuts off.
- I can tweak how the gaussian widths are generated, so that the width is dependent on the fraction of events fallign in a bin, instead of the absolute number.
- I can generate the errors after the fact by writing a chisquared calculating function
- I can write a fitting function that takes into account the errors in the basis functions.  This should work, but involves some difficult coding, and I'd rather avoid that if I can.

Edit:
A few more options:
- the width and peak of the dominating gaussian strongly suggest that it is a spill-over event, something that could be controlled by removing the offending points.