2008.09.30 Sided residual: purity, efficiency, and background rejection

Ilya Selyuzhenkov September 30, 2008

Data sets:

  • pp2006 - STAR 2006 pp longitudinal data (~ 3.164 pb^1)
    after applying gamma-jet isolation cuts (note: R_cluster > 0.9 is used below).
  • gamma-jet - data-driven Pythia gamma-jet sample (~170K events). Partonic pt range 5-35 GeV.
  • QCD jets - data-driven Pythia QCD jets sample (~4M events). Partonic pt range 3-65 GeV.

Notations used in the plots:

  • Fit peak energy:
    F_peak - integral within +-2 strips from maximum strip
    Maximum strip determined by fitting procedure.
    Float value converted ("cutted") to integer value.
  • Data peak energy:
    D_peak - energy sum within +-2 strips from maximum strip (the same strip Id as for F_peak).
  • Data tails:
    D_tail^left and D_tail^right.
    Energy sum from 3rd strip up to 30 strips on the
    left and right sides from maximum strip (excludes strips which contributes to D_peak)
  • Fit tails:
    F_tail^left and F_tail^right.
    Same definition as for D_tail, but integrals are calculated from a fit function.
  • Maximum sided residual:
    max(D_tail-F_tail)
    Maximum of the data minus fit energy on the left and right sides from the peak.

Determining cut line based on sided residual plot

Figure 1: Sided residual plot: D_peak vs. max(D_tail-F_tail)
Red lines show 4th order polynomial functions, a*x^4,
which have 80% of MC gamma-jet counts on the left side.
These lines are obtained independently for each of pre-shower condition
based on fit procedure shown in Fig. 3 below.

Figure 2: max(D_tail-F_tail) distribution
(projection on horizontal axis from sided residual plot, see Fig. 1 above)

Figure 3: max(D_tail-F_tail) [at 80%] vs. D_peak.
For each slice (bin) in D_peak variable, the max(D_tail-F_tail) value
which has 80% of gamma-jet candidates on the left side are plotted.

Lines represent fits to MC gamma-jet points (shown in red) using different fit functions
(linear, 2nd, 4th order polynomials: see legend for color coding).
Note, that in this plot D_peak values are shown on horizontal axis.
Consequently, to get 2nd order polynomial fit on sided residual plot (Fig. 1),
one needs to use sqrt(D_peak) function.
The same apply to 4th order polynomial function.

Figure 4: D_peak vs. horisontal distance from 4th order polinomial function to max(D_tail-F_tail) values.
(compare with Fig. 1: Now 80% of MC gamma-jet counts are on the left side from vertical axis)

Figure 5: Horizontal distance from 4th order polynomial function to max(D_tail-F_tail)
[Projection on horizontal axis from Fig. 4]
Based on this plot one can obtain purity, efficiency, and rejection plots (see Fig. 6 below)

Gamma-jet purity, efficiency, and QCD background rejection

Horizontal distance plotted in Fig. 5 can be used as a cut
separating gamma-jet signal and QCD-jets background,
and for each value of this distance one can define
gamma-jet purity, efficiency, and QCD-background rejection:

  • gamma-jet purity is defined as the ratio of
    the integral on the left for MC gamma-jet data sample, N[g-jet]_left,
    to the sum of the integrals on the left for MC gamma-jet and QCD jets, N[QCD]_left, data samples:
    Purity[gamma-jet] = N[g-jet]_left/(N[g-jet]_left+N[QCD]_left)

  • gamma-jet efficiency is defined as the ratio of
    the integral on the left side for MC gamma-jet data sample, N[g-jet]_left,
    to the total integral for MC gamma-jet data sample, N[g-jet]:
    Efficiency[gamma-jet] = N[g-jet]_left/N[g-jet]

  • QCD background rejection is defined as the ratio of
    the integral on the right side for MC QCD jets data sample, N[QCD]_right,
    to the total integral for MC QCD jets data sample, N[QCD]:
    Rejection[QCD] = N[QCD]_right/N[QCD]

Figure 6: Shows:
purity[g-jet] vs. efficiency[g-jet] (upper left);
rejection[QCD] vs. efficiency[g-jet] (upper right);
purity[g-jet] vs. rejection[QCD] (lower left);
pp2006 to MC ratio, N[pp2006]/(N[g-jet]+N[QCD]), vs. horizontal distance from Fig. 5 (lower right)