On the pages listed below I'm going to try to keep track of the work towards measuring the cross section for inclusive pion production in pp collisions from run 6. I'll try to keep the posts in chronological order, so posts near the bottom are more recent than posts near the top.
The plot below shows the total errors, statistical and systematic, for the cross section measurement. The inner tick marks on the error bars represent the statistical errors. The outer ticks represent the statistical and point-to-point systematic errors added in quadrature. The grey band shows the systematic error from the BEMC gain calibration uncertainty, which we have agreed to separate from the others (as it cannot be meaningfully added in quadrature to the other uncertainties).
Goal:
To calculate the systematic uncertainty on the cross section due to the BEMC calibration uncertainty.
Justification:
Clearly, the BEMC energy scale is of vital importance to this measurement. The energies measured by the BEMC are used to calculate many of physics level quantities (photon energy, pion pt, pion mass, Zgg) that directly enter into the determination of the cross section. Since the calibration of the BEMC is not (and indeed could not be) perfect, some level of uncertainty will be propagated to the cross section. This page will try to explain how this uncertainty is calculated and the results of those calculations.
Method:
Recently, the MIT grad students (lead by Matt Walker) calculated the gain calibration uncertainty for the BEMC towers. The results of this study can be found in this paper. For run 6, it was determined that the BEMC gain calibration uncertainty was 1.6% (in the interest of a conservative preliminary estimate I have gone with 2%). In the analysis chain, the BEMC gains are applied early, in the stage where pion candidate trees are built from the MuDSTs. So if we are to measure the effect of a gain shift, we must do so at this level. To calculate the systematic uncertainty, I recalculated the cross section from the MuDSTs with the BEMC gain tables shifted by plus or minus 2%. I recreated every step of the analysis from creating pion trees to recreating MC samples, calculating raw yields and correction factors, all with shifted gain tables. Then, I took the ratio of new cross section to old cross section for both shifts. I fit these ratios after the first two bins, which are not indicative of the overall uncertainty. I used these fits to estimate the systematic for all bins. The final results of these calculation can be found below.
Plots:
1) All three cross secton plots on the same graph (black = nominal, red = -2%, blue = +2%)
2) The relative error for the plus and minus 2% scenarios.
Discussion:
This method of estimating the systematic has its drawbacks, chief among which is its maximum-extent nature. The error calculated by taking the "worst case scenario," which we are doing here, yields a worst case scenario systematic. This is in contrast to other systmatics (see here) which are true one-sigma errors of a gaussian error distribution. Gaussian errors can be added in quadrature to give a combined error (in theory, the stastical error and any gaussian systematics can be combined in this manner as wel.) Maximum extent errors cannont be combinded in quadrature with gaussian one-sigma errors (despite the tendency of previous analyzers to do exactly this.) Thus we are left with separate sources of systematics as shown below. Furthermore, clearly this method accurately estimates the uncertainty for low pt points. Consider, for example, the +2% gain shift. This shift increases the reconstructed energies of the towers, and ought to increase the number of pion candidates in the lower bins. However, trigger thresholds are based on the nominal energy values not the increased energy values. Since we cannot 'go back in time' and include events that would have fired the trigger had the gains been shifted 2% high, the data will be missing some fraction of events that it should include. I'm not explaining myself very well here. Let me say it this way: we can correct for thing like acceptence and trigger efficiency because we are missing events and pions that we know (from simulation) that we are missing. However, we cannot correct for events and candidates that we don't know we're missing. Events that would have fired a gain shifted trigger, but didn't fire the nominal trigger are of the second type. We only have one data set, we can't correct for the number of events we didn't know we missed.
All this being said, this is the best way we have currently for estimating the BEMC energy scale uncertainty using the available computing resources, and it is the method I will use for this result.
Objective:
Show that, for an important set of kinematic variable distributions, the full QCD 2->2 Pythia MC sample matches the data. This justifies using this MC sample to calculate 'true' distributions and hence correction factors. All plots below contain information from all candidates, that is, all diphoton pairs that pass the cuts below.
Details:
Data (always black points) Vs. T2_platinum MC (always red) (see here)
Cuts:
Plots:
1a)
The above plot shows the Zgg distribution ((E1 - E2)/(E1 + E2)) for data (black) and (MC). The MC is normalized to the data (total integral.) Despite what the title says this plot is not vs. pt, it is integrated over all values of pt.
1b)
The above left shows Data/MC for Zgg between 0 and 1. The results have been fit to a flat line and the results of that fit are seen in the box above. The above right shows the histogram projection of the Data/MC and that has been fit to a guasian; the results of that fit are shown.
2a)
The above plot shows the particle eta for data pions (black) and MC pions (red). The MC is normalized to the data (total integral.) As you can see, there is a small discrepancy between the two histograms at negative values of particle eta. This could be a symptom of only using one status table for the MC while the data uses a number of status tables in towers and SMD.
2b)
The above left plot shows Data/MC for particle eta, which is fit to a flat line. The Y axis on this plot has been truncated at 2.5 to show the relevant region (-1,1). the outer limits of this plot have points with values greater than 4. The above right shows a profile histogram of the left plot fit to a gaussian. Note again that some entries are lost to the far right on this plot.
3)
The above plot shows detector eta for data (black) and MC (red). Again we see a slight discrepancy at negative values of eta.
3b)
The above left shows Data/MC for detector eta, and this has been fit to a flat line. Again note that the Y axis has been truncated to show detail in the relevant range. The above right shows a profile histogram of the left plot and has been fit to a guassian.
4)
The above plot shows the raw yields (not background subtracted) for data (black) and MC (red) where raw yield is the number of inclusive counts passing all cuts with .08 < inv mass < .25 and Zgg < 0.7. There is a clear 'turn on' curve near the trigger threshold, and the MC follows the data very nicely. For more information about the individual mass peaks see here.
Conclusions:
The Monte Carlo sample clearly recreates the data in the above distributions. There are slight discrepancies in the eta distributions, but they shouldn't preclude using this sample to calculate correction factors for the cross section measurement.
Attached is the presentation I gave at the collaboration meeting in late March. It was decided that before releasing a preliminary result, we would have to calculate the BEMC gain calibration uncertainty for 2006. A group of us at MIT created a plan for estimating this uncertainty (see here) and are currently working on implementing this plan as can be seen in Adam's blog post here:
http://drupal.star.bnl.gov/STAR/blog-entry/kocolosk/2009/apr/06/calib-uncertainty-update.
We hope to have a final estimate of this uncertainty by early next week. In that case, I can hopefully calculate the uncertainty in the cross section by the spin pwg meeting on 4/23. If this can be done, I will present the results then and ask to release the preliminary result for DIS 2009. If we cannot calculate a gain calibration systematic by 4/23, obviously we will not be asking to release the preliminary result.
Below you will find comparisons of the Run 6 cross section with recent Phenix results and the STAR Run 5 final result. The lower panel is the statistical error on the Run 6 points. Uncertainties for the other plots are not included. Plot 1 shows a comparison between run 6 and the Phenix result as published in PRD. Plot 2 shows a comparison between run 6 and the Run 5 final result from STAR, and plot three 3 shows all three results together. As requested, I included the errors for both run 5 and 6 results. The error bars on the cross section points are a quadrature sum of all errors (stat and sys) for the results.
Objective:
Explore the pt-dependent mean mass position in data and MC and perhaps draw some conclusions about the quality of our simulations.
Details:
Data Vs T2 platinum MC (see here for explanations)
Bins in Pt {5.5, 6., 6.75, 7.5, 8.25, 9.25, 11., 13., 16., 21.}
Cuts:
Plots:
1)
The above plot shows data (black) Vs. MC (red) for Zgg for my 9 pt bins. The MC plots are normalized to the data so that the total number of entries is equal in both distributions.
2)
Apologies for the poor labeling. The above left plot shows the mean mass per Pt for data (black) and MC (red). These means are calculated by fitting the mass spectra to a gaussian between .1 and .2 GeV/c^2. (see here for more) In addition to the cuts listed at the top of page, I make a Zgg < .7 cut and an | particle eta | < .7 cut on all pion candidates. The PDG mass of the pi0 is shown in blue. The above right plot shows the ratio of Data/MC for the mean masses, again as a function of Pt. This plot is fit to a flat line and the fit parameters are shown.
Conclusions:
The most basic conclusion we can draw is that the simulation is recreating the floating mass peak quite well. The data is never more than 3% higher or lower than the simulation and a flat-line fit is really darn close to one. Couple this with the Zgg comparison and I think we can say that we are simulating the EMC (and SMD) response correctly, at least as it concerns reconstructing neural pions between 5 - 20 GeV/c. Of particular interest is the Zgg comparisons at relatively higher Pt, as then the distance between the two reconstructed photons is smaller than the size of one tower, and we rely on the SMD to correctly identify the daughter photons.
At the PWG meeting a few weeks ago, people expressed concerns about the high-mass tail from the pion peak that, for higher PT bins, is underestimated by the pion peak simulation. This is shown (in the most egregious case) below (see especially, the left-hand plot between .2 and .5):
It is clear, at least at high-pt that the aggregate MC model does not fully reproduce this 'bleeding edge'. Naturally the question arises of how sensitive the cross section measurement is to this tail. One way to probe the sensitivity is to vary the invariant mass cut around the peak position. For the preliminary result, I used a 3 sigma cut calculated by fitting the mass peak to a gaussian. I can also calculate the cross section using a 2 sigma and 4 sigma cut. These cuts are shown below...
This plot shows a closeup of the pion peak for each of the 9 bins, with vertical lines showing the three different mass windows. For most bins, the 2 sigma cut completely excludes the tails and the 4 sigma cut includes a large portion of the tails.
I can then compare the different windows to the nominal window and the smaller (larger) windows.
Bin | % diff 4 sig | % diff 2 sig |
1 | 2.4 | 1.4 |
2 | 1.6 | 0.5 |
3 | 3.3 | 1.7 |
4 | 6.2 | 3.0 |
5 | 5.6 | 4.5 |
6 | 10.6 | 4.8 |
7 | 10.3 | 5.3 |
8 | 13.5 | 2.3 |
9 | 10.1 | 0.62 |
The largest % difference is 13.5%, in the 8th bin. For the most part the higher pt bins can be off by ~10% for a large mass window.
Goal:
To test the stability of the cross section measurement to changes in the analysis cuts and, if necessary, assign a systematic uncertainty for cut variations.
Justification:
At many points in the measurements I make cuts on the data. For example, I place a maximum Zgg cut of 0.7 on all pion candidates. These cuts are motivated by our understanding of the detector and underlying physics, but the specific location of each cut is somewhat arbitrary. The cuts could move by some small amount and still be well-motivated. The measurement should be relatively stable with respect to changing analysis cuts. To test this, I take the three most important cuts, Zgg, inv. mass window, and z vertex, and vary them by some small amount in either direction. I then repeat the analysis from start to finish to discern the effect (if any) these cut changes have on the final cross section. The procedure is similar to that used to test the BEMC energy scale systematic (seen here)
Plots:
Instead of showing the cross section measurement for each individual cut change, I will plot the results in what I hope is a more illuminating way. Below is nine plots, each one representing a Pt bin in my analysis. The Pt range in GeV/c is shown in the upper right hand corner. Plotted on each canvas are the (Delta Sigma)/(Sigma) points for each of the cut variations (see key below.) Also note the solid lines indicating the statistical error of the nominal cross section measurement in that bin.
KEY:
point x position Cut Change
2 Invariant Mass Window of plus/minus 4sigma (nominal is 3sigma)
3 Zgg - 10%
4 Z vertex cut + 10%
7 Invariant Mass Window of plus/minus 2sigma (nominal is 3sigma)
8 Zgg cut - 10%
9 Z vertex - 10%
Broadly speaking, the points on the left side of the dashed-dotted line at 5.5 are cut changes that ought to increase raw yield and cuts on the right side ought to decrease raw yield. Of course an increase (decrease) in raw yield does not always translate into an increase (decrease) in cross section because the raw yields are corrected. Note that for most bins the effect for all cut changes is small (on the same order as the statistical uncertainty.) Other systematics (BEMC energy scale and yield extraction) dominate the uncertainty.
Data and MC comparison for pion mass peak position..
The left side shows the individual mass peak positions for Data (black) and MC (red). The MC points trend a touch higher than the data point. On the right is (Data-MC)/MC. The right side plot is fit to a pol0, the value of the fit is shown to be -1.01%.
For the cross section analysis I am using a number of Monte Carlo samples along with one data set. The details of each of these sets can be found below:
Pion enhanced, QCD 2->2 sample (full event): aka "T2 Platinum":
This MC sample was produced at Tier 2 by Mike Betancourt for me. It consists of ~200,000 events in each of following partonic pt bins {5-7, 7-9, 9-11, 11-15, 15-25, 25-35} GeV. The events were pre-filtered at the pythia level so every event contains a pythia-level pion with the following kinematic cuts: Pt > 4 GeV, -1.2 < particle eta < 1.2. The code to generate the trees lives on /star/u/ahoffman/T2_maker and is complied in SL08c. The MuDsts and .geant files are archived at HPSS. The trees themselves (600 files) live on /star/institutions/mit/ahoffman/Pi0Analysis/T2_platinum/Trees/. The following parameters were set in the analysis macro:
emcSim->setCalibSpread(kBarrelEmcTowerId,0.15);
emcSim->setCalibOffset(kBarrelEmcTowerId,0.);
emcSim->setCalibSpread(kBarrelSmdEtaStripId,0.25);
emcSim->setCalibOffset(kBarrelSmdEtaStripId,0.0);
emcSim->setMaximumAdc(kBarrelSmdEtaStripId,700);
emcSim->setMaximumAdcSpread(kBarrelSmdEtaStripId,70);
emcSim->setCalibSpread(kBarrelSmdPhiStripId,0.25);
emcSim->setCalibOffset(kBarrelSmdPhiStripId,0.0);
emcSim->setMaximumAdc(kBarrelSmdPhiStripId,700);
emcSim->setMaximumAdcSpread(kBarrelSmdPhiStripId,70);
As of now, only events which pass the software trigger conditions for trigger 137611 (HTTP L2gamma) or 117001 (mb) are saved. These events are weighted properly using Mike B's custom weight calculator for his filtered events. That code can be found /star/u/ahoffman/BetanWeightCalc/
Single Particle Monte Carlo Sets
I have three separate single particle MC samples, single pion, single eta, and single gamma. These were produced using the code located at /star/u/ahoffman/SingleParticle_platinum/. The starsim, bfc, and treemaking code is all there. The .MuDsts and .geant files that result from the bfc jobs are run through a treemaker similar to that for the full pythia monte carlo. These samples are used to estimate the background shapes in in the diphoton invariant mass spectrum. The single gamma sample, for example, is used to model the 'low mass' background, as pion candidates are found from split clusters. The following cuts were set in the macro:
emcSim->setCalibSpread(kBarrelEmcTowerId,0.15);
emcSim->setCalibOffset(kBarrelEmcTowerId,0.);
emcSim->setCalibSpread(kBarrelSmdEtaStripId,0.25);
emcSim->setCalibOffset(kBarrelSmdEtaStripId,0.0);
emcSim->setMaximumAdc(kBarrelSmdEtaStripId,700);
emcSim->setMaximumAdcSpread(kBarrelSmdEtaStripId,70);
emcSim->setCalibSpread(kBarrelSmdPhiStripId,0.25);
emcSim->setCalibOffset(kBarrelSmdPhiStripId,0.0);
emcSim->setMaximumAdc(kBarrelSmdPhiStripId,700);
emcSim->setMaximumAdcSpread(kBarrelSmdPhiStripId,70);
One important difference to note is that these events are not held to the simulated trigger standard. Instead, I only choose events (offline) that have pion candidates with Pt above the trigger threshold. Essentially this assumes that the trigger efficiency is perfect for such events. Obviously this is not true, but in my analysis these samples are used primarily to estimate the background line shapes. The single pion events are not weighted. The other two samples are weighted according to the funcional form given by the PHENIX cross section measurements.
Data
This analysis is only concerned with run 6 data, and only data that satisfies the HTTP L2gamma trigger (online and software.) I restrict myself to good runs between 7139017 and 7143025, as it is the longest period of run 6 with relatively stable tower and SMD status tables. Parts of the barrell are missing in this run range, and this will be accounted for in a geometric acceptance correction, but I believe that the stability of the status tables is more important that having a 100% live detector. Using a stable run range will cut down on the systematic error of the measurement which, given previous measurements, will be larger than the statistical error. The data was produced by Murad using my StSkimPionMaker (which can be found in StSpinPool) as part of the SpinAnalysisChain. The output trees are located at /star/institutions/mit/ahoffman/Pi0Analysis/Murads_Production_2_08/. The following parameters were made in the macro:
As of now, only events which pass either HTTP L2gamma (online and software) or MB triggers are saved. Only the L2g triggered events are used in the analysis. All other cuts are made offline and will be listed in individual analysis sections to follow.
Mixed Event Background:
I should note that I also make use of a 'mixed event' sample that is made by taking photons from different (real data) events and mixing them to form pion candidates. This sample is used to model the combinatoric background as described here.
Details
Data Vs. T2 Platinum (see here)
Cuts:
Bins:
9 pt bins, with boundries {5.5, 6., 6.75, 7.5, 8.25, 9.25, 11., 13., 16., 21.}
plots:
All MC plots are normalized. They are scaled to match the number of counts in the data between M = .08 and M = .25 GeV/c^2 (the nominal signal region.)
The above plots shows the data/mc comparison for the first four bins (the bin number is in the upper right corner.) The MC peak position and width track relatively well to the data. In the data, there is an excess of 'low mass' background events. This is expected as the MC is filtered for high pt pion events and thus the background will be artificially suppressed compared to the signal.
The above plot shows the last five bins along with the total comparison for all bins (lower right)
The Data (black) and MC (red) are fit to separate gaussians between .1 and .2 GeV/c^2. The mean masses for each are plotted below. Apologies for the lack of axis labels. The x-axis is pion pt and the y-axis is mean mass of that pt bin. The errors on the points are the errors on the fits. I would take the last bin with a grain of salt considering the stats.
The blue line is the pdg mean mass of the pion. As you can see, the MC tracks the data very well, recreating the rising mass peak.
Ok, now we're cooking. Most of the ingredients are in place. We have our background subtracted raw yields. We have our generalized correction factor to account for inefficiencies in trigger, reconstruction, etc. Now, it's time to take a first look at a cross section. At a high level, we'll be starting with the raw yields, and applying a number of corrections for geometrical acceptance, efficiencies, etc. to recreate the true distribution. The formula for the invariant differential cross section:
Where:
Pt = Average Pt in a bin (as an aside, all points are plotted at this value)
N_{raw} = background subtracted raw yields
delta_pt = bin width in pt
delta_eta = 1.4 (= size of pseudorapidity range -.7 to .7)
C_{trig+reco} = Trigger + Reconstruction Efficiency (Generalized) Correcton Factor
Gamma/Gamma = branching fraction for Pi^{0} -> gamma gamma (=98.8%)
L = Luminosity
After applying all of these corrections, we arrive at the cross-section below.
The a) panel shows the invariant cross section along with 2 NLO pQCD predictions (based on DSS and KKP FFs.) The b) panel shows the relative statistical errors on the cross section. Panel c) shows the (data-NLO)/NLO for both pQCD predictions as well as for predictions from DSS at two different factorization scales. The points are placed at the average Pt for a bin. As you can see on in panel c) the measured cross section agrees well with theory for both DSS and KKP FFs.
Goal:
Properly model the signal and background to the invariant mass plots using four single particle MC sets normalized to fit the data. Further, subtract out background contributions to arrive at a raw yield for each pt bin.
Details:
Data Vs. Single Particle MC (see here)
Cuts:
Bins:
9 pt bins, with boundries {5.5, 6., 6.5, 7., 7.75., 9., 11.5, 13.5, 16., 21.}
Plots:
1)
Above is a plot of the invariant mass distributions for the 9 pt bins used in this analysis. The black crosses represent the data (with errors.) The four colored histograms show the invariant mass distributions of pion candidates found in single pion MC (purple), Single photon MC (red), single eta MC (blue) and mixed-event MC (green). The four distributions are simultaneously fit to the data.
2)
The above plot shows a data/MC comparison, where the red MC curve is the sum of the four single particle curves shown in plot 1. As you can see (especially in bins 3-7) the single particle MC seems to be underestimating the width of the pion peak, especially on the high mass side. The MC peak is too narrow. I think this can be explained by two effects. First, I am overestimating the energy precision of the towers and SMDs. The width of the mass peak is directly related to the energy resolution of the towers and SMD. I think this is telling us that the simulated resolution is too precise. Also, there is the issue of jet background, which is not present in these simulations and would tend to add small amounts of energy to each photon (thus increasing the mass and the pt of the pion candidate.)
Conclusions:
Obviously this MC parameterization is not quite good enough to compare to the data. I want to go back and remake the MC distributions with more smearing in the energy resolution, and perhaps with a small pt-dependent term to simulate the jet background.
Following up from this post concerning the modeling of the invariant mass distribution using different monte carlo samples for the signal and three sources of background, I respun through the modeling algorithm with an added smearing to the masses. The procedure is outlined below.
Goal:
Properly model the signal and background to the invariant mass plots using four single particle MC sets normalized to fit the data. Further, subtract out background contributions to arrive at a raw yield for each pt bin.
Details:
Data Vs. Single Particle MC (see here)
Cuts:
Bins:
9 pt bins, with boundries {5.5, 6., 6.5, 7., 7.75., 9., 11.5, 13.5, 16., 21.}
Smear:
The mass and of the single pions are smeared by sampling from a pt dependent normal distribution of the form N(1.005*Pt,.04). Mass = smear*smear*old_mass. This is supposed to mimic not only the detector resolution, but also the artificial increase in photon energy resultant from excess jet background at higher Pt.
Obviously this is not the correct way to do this sort of smearing; it should be done at the BEMC level in the simulation, but this is a rapid way to test out assumption that smearing will better recreate the mass peak.
Plots:
1)
Above is a plot of the invariant mass distributions for the 9 pt bins used in this analysis. The black crosses represent the data (with errors.) The four colored histograms show the invariant mass distributions of pion candidates found in single pion MC (purple), Single photon MC (red), single eta MC (blue) and mixed-event MC (green). The four distributions are simultaneously fit to the data.
2)
The above plot shows a data/MC comparison, where the red MC curve is the sum of the four single particle curves shown in plot 1. As you can see (especially in bins 3-7) the single particle MC much better recreates the mass peak with the smearing. It is not perfect, but compared to the original test, at least by eye, it looks better.
Conclusions:
I think this test supports the conclusions that the BEMC response for single pion simulation was too precise originally. Extra Pt dependent smearing should be added into the BEMC tower response.
As noted here and here, the pion peak is difficult to model using single-particle MC. In single particle studies, the pion inv mass peak is reconstructed to narrow. Instead of trying to manipulate the single particle MC to conform to the data (using smearing techniques) I decided instead to model the pion peak using identified pions from the filtered pythia MC sample, where identified in this context means the reconstructed pion is less than .01 (in eta-phi space) away from a thrown pythia pion. As seen here, the mean and width of the peaks from the filtered pythia set match the data extremely well.
Goal:
Properly model the signal and background to the invariant mass plots using single particle MC background shapes and identified pion peak shapes from filtered pythia MC normalized to fit the data. Further, to subtract out the background on a bin by bin basis to arrive at a raw yield for each pt bin.
Details:
Data Vs. Single Particle MC (see here) and Identified Pions from filtered pythia MC (see here)
Cuts:
Bins:
9 pt bins, with boundries {5.5, 6., 6.5, 7., 7.75., 9., 11.5, 13.5, 16., 21.}
Plots:
1)
Above is a plot of the invariant mass distributions for the 9 pt bins used in this analysis. The black crosses represent the data (with errors.) The four colored histograms show the invariant mass distributions of pion candidates found in identified pions from filtered pythia MC (purple,) Single photon MC (red,) single eta MC (blue,) and mixed-event MC (green.) The four distributions are simultaneously fit to the data.
2)
The above plot shows a data/MC comparison, where the red MC curve is the sum of the four single particle curves shown in plot 1. As you can see (especially in bins 3-7) the identified pion spectrum from filtered pythia MC much better recreates the mass peak than single particle (compare here.)
3)
The above plot shows the diphoton invariant mass spectrum for each of the 9 pt bins after the background shapes have been subtracted off. To calculate the background subtracted raw yields, these peaks are integrated from mean - 3sigma to mean +3sigma of a gaussian fit to the peak.
The proposed final version of the cross section plot would look like this.
The link below has the presentation for preliminary cross section result.
Now that I have my raw pion spectrum (see here) I need to proceed in transforming those raw counts into a cross section measurement. The first step in this process is calculating a correction factor that accounts for inefficiencies in the trigger and reconstruction algorithm. I will calculate this correction factor using the filtered, full-pythia MC sample. To first order the correction factor C = N_{reco}(Pt)/N_{true}(Pt) where N_{reco} = the number of pions found with our reconstruction algorithm and trigger, binned in Pt, after all cuts have been applied, and N_{true} is the true number of pions in the pythia record within the usable detector volume that should have fired the trigger. Note that C is not strictly a reconstruction efficiency. It represents a convolution of efficiencies in the reconstruction algorithm, trigger, acceptance, finite detector resolution, bin migration, merging and splitting effects.
Goal:
Calculate generalized correction factor.
Data Sets Used:
T2 Platinum MC (see here.) Previous studies (here and here) show that this MC sample very reasonably mimics the real data, especially within the pion mass peak.
Cuts:
Bins:
9 pt bins, with boundries {5.5, 6., 6.5, 7., 7.75., 9., 11.5, 13.5, 16., 21.}
Plots:
1)
The above plot shows the generalized correction factor N_{reco}(Pt)/N_{true}(Pt). N_{reco} is the number of reconstructed pions, where pions are found and reconstructed using the exact same procedure as is done for real data. The events in the MC sample are properly weighted.
We would like to check the applicability of a correction factor calculated similarly to the one above. To do this I split the filtered MC sample into two separate samples. One of these (MCd) I treat as data and the other (MCt) I treat as MC. I run the MCd sample through normal pion reconstruction and extract a raw yield spectrum. Then from the MCd sample I calculate a correction factor similar to the one above. I then apply the correction factor to the MCd raw yield distribution. The results are below.
The black squares are the raw yields of the 'data' set as a function of Pt. The red squares are the true pt distribution of pythia pions. The blue squares are the fully corrected yields obtained by applying the correction factor to the black points. As you can see, after applying the correction factor to the reconstructed 'data' the true distribution is obtained.
The collaboration has concerns about the SMD, and not without reason. They, they SMDs, have been notoriously hard to understand and model. And the SMD is central to my analysis so I need to be able to trust that I understand it. To this end, I am using a comparison of Zgg in Data and MC to claim that I understand the SMDs well enough. The SMDs are mainly used in my analysis to split the energy of tower clusters into single photons. Zgg is a measurement of this splitting. This effect is exaggerated at higher values of Pt, where the two photons from a single pion are most likely to fall in a single pion.
Below is nine plots of Data (black) Vs MC (red) for my 9 pt bins in my cross section analysis. The MC is normalized (over all pt bins) to the integral of the data.
As you can see, the data and MC line up very well within statistics. The last bin should be taken with multiple grains of salt considering the number of entries in the hists. This is the justification I offer for using simulated SMD response in calculating correction factors and background shapes to compare to the data.
Goal:
To calculate a combined systematic error for the yield extraction (i.e. background subtraction) and the correction factor, and to do so in a way that allows me to properly combine systematic uncertainties and statistical uncertaintiees in a meaningful way.
Method:
as shown here, my background-subtracted raw yields are calculated using what I call the jigsaw method. I model the background shapes using simulated data (both single particle and full pythia.) These shapes are simultaneously fit to the data and then subtracted from the data leaving a pure pion peak. This peak is integrated over to find the raw yeild in any particular bin. Obviously this method is succeptible to uncertainty, especially in the normailzation of the background shapes to the data. This normalization determines how many counts are subtracted from the data and has a direct influence on the final counts.
Previous analyses have taken a maximum-extent approach to this problem. The raw yields are calculated using some extreme scenario such as assuming no background at all or fitting the background to a polynomial. A major problem with this method is that these extreme scenarios are improbable. They sample only the largest outliers of whatever underlying distribution the background actually is. Further, these systematic uncertainties are then quoted as on equal footing with statistial uncertainties arising from gaussian processes and constituting 1 sigma errors. Thus, the systematics are vastly overestimated which leads to a large overall uncertainty and a weaker measurement. This problem is compounded when separate maximum extent errors are calculated for related variables (such as yield extraction and correction factor) and then added together in quadrature. We ought to be able to do better.
As stated above the end result of the Jigsaw method is a set of scaling parameters for each of the background shapes. The shapes are scaled by these parameters and then subtracted away. If the scaling parameters are wrong, the final yield will be wrong. Fortunately, the fitting procedure provides not only a scale for the shape but an uncertainty on that scale. So we know, how likely the scale is to be wrong. Instead of picking an outlying scenario (e.g. all scaling factors = 0) we can calculate the yields with a range of scaling factors sampled from an underlying gaussian probability distribution with a mean of the nominal scaling value and a width of the error on that scaling factor. By sampling enough points, we can build up a probability distribution for the measured point (which should also be gaussian) and take a 1 sigma error on that distribution. This error will not only be more accurate but will be on equal footing with the statistical error of the measured point.
Of course the final cross section value is a convolution of the background subtracted raw yields and the generalized correction factor. We need to vary the fitting paramaters for both at the same time to obtain an accurate estimation of the error on the final value. When we do this we get distributions for the final cross sections on a bin by bin basis. See below
Bins:
9 pt bins, with boundries {5.5, 6., 6.5, 7., 7.75., 9., 11.5, 13.5, 16., 21.}
Plots:
1)
The above shows the bin by bin cross section values after 10,000 iterations of the sampling procedure described above. The systematic error for yield extraction + correction factor can be taken to be the width/mean of the above gaussian fits.
The Bin by bin relative systematic errors are as follows
Bin Rel. Sys.
1 14.7%
2 8.2%
3 10.2%
4 9.6%
5 12.3%
6 11.6%
7 12.0%
8 13.2%
9 25.0%
previous measurements of the systematic uncertainties for these two contributions (when added in quadrature) yield an average systematic of ~20%. As you can see, this method produces substantially reduced uncertainties in most bins in a fashion that is (as I argue above) more robust than previous methods.