2008.09.30 Sided residual: purity, efficiency, and background rejection
Ilya Selyuzhenkov September 30, 2008
Data sets:
 pp2006  STAR 2006 pp longitudinal data (~ 3.164 pb^1)
after applying gammajet isolation cuts (note: R_cluster > 0.9 is used below).  gammajet  datadriven Pythia gammajet sample (~170K events). Partonic pt range 535 GeV.
 QCD jets  datadriven Pythia QCD jets sample (~4M events). Partonic pt range 365 GeV.
Notations used in the plots:
 Fit peak energy:
F_peak  integral within +2 strips from maximum strip
Maximum strip determined by fitting procedure.
Float value converted ("cutted") to integer value.  Data peak energy:
D_peak  energy sum within +2 strips from maximum strip (the same strip Id as for F_peak).  Data tails:
D_tail^left and D_tail^right.
Energy sum from 3rd strip up to 30 strips on the
left and right sides from maximum strip (excludes strips which contributes to D_peak)  Fit tails:
F_tail^left and F_tail^right.
Same definition as for D_tail, but integrals are calculated from a fit function.  Maximum sided residual:
max(D_tailF_tail)
Maximum of the data minus fit energy on the left and right sides from the peak.
Determining cut line based on sided residual plot
Figure 1: Sided residual plot: D_peak vs. max(D_tailF_tail)
Red lines show 4th order polynomial functions, a*x^4,
which have 80% of MC gammajet counts on the left side.
These lines are obtained independently for each of preshower condition
based on fit procedure shown in Fig. 3 below.
Figure 2: max(D_tailF_tail) distribution
(projection on horizontal axis from sided residual plot, see Fig. 1 above)
Figure 3: max(D_tailF_tail) [at 80%] vs. D_peak.
For each slice (bin) in D_peak variable, the max(D_tailF_tail) value
which has 80% of gammajet candidates on the left side are plotted.
Lines represent fits to MC gammajet points (shown in red) using different fit functions
(linear, 2nd, 4th order polynomials: see legend for color coding).
Note, that in this plot D_peak values are shown on horizontal axis.
Consequently, to get 2nd order polynomial fit on sided residual plot (Fig. 1),
one needs to use sqrt(D_peak) function.
The same apply to 4th order polynomial function.
Figure 4: D_peak vs. horisontal distance from 4th order polinomial function to max(D_tailF_tail) values.
(compare with Fig. 1: Now 80% of MC gammajet counts are on the left side from vertical axis)
Figure 5: Horizontal distance from 4th order polynomial function to max(D_tailF_tail)
[Projection on horizontal axis from Fig. 4]
Based on this plot one can obtain purity, efficiency, and rejection plots (see Fig. 6 below)
Gammajet purity, efficiency, and QCD background rejection
Horizontal distance plotted in Fig. 5 can be used as a cut
separating gammajet signal and QCDjets background,
and for each value of this distance one can define
gammajet purity, efficiency, and QCDbackground rejection:

gammajet purity is defined as the ratio of
the integral on the left for MC gammajet data sample, N[gjet]_left,
to the sum of the integrals on the left for MC gammajet and QCD jets, N[QCD]_left, data samples:
Purity[gammajet] = N[gjet]_left/(N[gjet]_left+N[QCD]_left) 
gammajet efficiency is defined as the ratio of
the integral on the left side for MC gammajet data sample, N[gjet]_left,
to the total integral for MC gammajet data sample, N[gjet]:
Efficiency[gammajet] = N[gjet]_left/N[gjet] 
QCD background rejection is defined as the ratio of
the integral on the right side for MC QCD jets data sample, N[QCD]_right,
to the total integral for MC QCD jets data sample, N[QCD]:
Rejection[QCD] = N[QCD]_right/N[QCD]
Figure 6: Shows:
purity[gjet] vs. efficiency[gjet] (upper left);
rejection[QCD] vs. efficiency[gjet] (upper right);
purity[gjet] vs. rejection[QCD] (lower left);
pp2006 to MC ratio, N[pp2006]/(N[gjet]+N[QCD]), vs. horizontal distance from Fig. 5 (lower right)
 Printerfriendly version
 Login or register to post comments