# Run 9 200GeV Dijet Cross Section Unfolding Test

Here I look at some of the unfolding methods provided in the RooUnfold package ...

Until now, all the cross sections I have shown have used a raw yield corrected by a Bin-by-Bin correction factor which I calculate using my own code. The RooUnfold package implements several unfolding methods in a root framework including an iterative Baysian method and a Singular Value Decomposition (SVD) method in addition to the Bin-by-Bin method among others. If I decide to move away from the Bin-by-Bin correction factor, I will use the SVD method since this treats the errors introduced due to finite simu statistics correctly and the Baysian implementation does not. I still look at the Baysian method because it provides an easy test of the convergence of the unfolding.

Throughout this page, I use three different dijet mass binning schemes. The first has 12 bins which are defined below. The second configuration removes the 13-16 GeV bin. The third configuration removes the 13-16 and 16-19 GeV bins. The bin edges are:

{ 13, 16, 19, 23, 28, 34, 41, 49, 58, 69, 82, 100, 120 }

The 13-16 and 16-19 GeV mass bins are the two which show large discrepancy with the theory.

Figure 1: This figure shows the Chi2 / #Bins value after each iteration for the baysian unfolding for the 3 different binnings. I show 20 iterations. The convention is to stop iterating after the Chi2 / #Bins is less than one between iterations.

The above plot shows that only the set discarding the first two bins reaches the Chi2/Bins = 1 condition. The set which included all bins does reach a plateau after 9 iterations, but it is at a value of ~3.5 instead of 1. The set in which only the first bin is removed does not level off even after 20 iterations and the Chi2 difference between iterations is always greater than 10 which makes me hesitant to use that configuration.

Figure 2: This figure shows the ratio of the yield from the bayes unfolding for the three different bin sets to the yield from the Bin-by-Bin correction. The Bin-by-Bin correction is insensitive to bin migration effects and so gives the same result for all three bin schemes making it a good standard to compare to. The actual yields can be seen here.

Figure 3: This is the same as figure 2, but now looking at the yields from the SVD unfolding. Here I use the same cutoff parameter of 6 for each bin set. The actual yield spectrum can be seen here.

It appears that the 10 bin configuration shows the largest disagreement with the Bin-by-Bin result in the lower mass bins. Also, despite the different convergence behaviours, the 12 bin and 11 bin configurations track eachother pretty well.

Now I want to take the unfolded yields and compute the cross section and compare to the raw theory (without UEH corrections).

Figure 4: This figure shows the (data-theory)/theory ratio for the cross section computed from the yields which were corrected using the Bin-by-Bin method, the Bayes method with 11 iterations, and the SVD method with a cutoff of 6. This was done using the full 12 bin configuration. The cross section spectrum can be seen here.

Figure 5: Same as figure 4 but now using the 11 bin configuration. The number of iterations for the bayes method was set to 20 and the cutoff parameter for the SVD method was set at 6. The cross section spectrum can be seen here.

Figure 6: Same as figures 4 and 5 but now using the 10 bin configuration. The number of iterations for the bayes method was 8 and the cutoff for the SVD method remained at 6. The cross section spectrum can be seen here.

At the jet meeting on 6/11/13, it was suggested that I use the SVD method and try unfolding with the 11 bin configuration. Figure 7 below shows the values of the "d" vector for each singular value. The point where these values become smaller than 1 should indicate where the unfolding is dominated by statistical fluctuations and should be stopped.

Figure 7: This figure shows the values of the "d" vector for each singular value which can be used to determine where the cutoff for the SVD unfolding should be placed. This is for the 11 bin configuration.

Figure 8: This figure shows the (data-theory)/theory ratio for the raw yield unfolded using the 11 bin configuration and several different values of the SVD cutoff parameter. The theory compared to is the MSRT2004.

Figure 9: This figure is the same as figure 8 except I have removed all SVD curves except for the one with the cutoff value = 9 for easier comparison with the Bin-by-Bin curve.

It appears that there is quite a bit of variation between the SVD curves with different cutoff values. Despite the 'd' vector plot shown in figure 7 it appears that a cutoff value of 9 may be too high and introduce too much statistical variation, especially in the lower mass bins. I want to look at lower cutoff values and try to build a defensible case for choosing a value.

Figure 10: This figure shows the (data-theory)/theory ratio for the raw yield unfolded using the SVD method with a cutoff of 6 and the 11 bin configuration as well as the Bin-by-Bin method. The theory compared to is the MSRT2004.

Figure 11: This figure is the same as figure 10 but now I include curves for the SVD method with cutoffs of 4 and 8 as a way to guage the systematic uncertainty associated with choosing a cutoff parameter.

The cutoff value of 6 is compeling because it tracks pretty closely with the Bin-by-Bin correction which shouldn't be affected by statistical fluctuations due to improper regularization of the matrix inversion problem present in the SVD and Bayes methods. As another check, I look at the Bayes method for a number of different iterations and the values do not deviate very far from the Bin-by-Bin result.

Figure 12: This figure shows the (data-theory)/theory ratio for the raw yield unfolded using the Bayes method with a number of different iterations and the 11 bin configuration as well as the Bin-by-Bin method.

The plots I have shown above did not consider statistical errors. In previous blogs, I have propagated the error on the Bin-By-Bin correction factor manually via the standard error propagation methods. One purpose of using the SVD method from RooUnfold was that it handled statistical errors due to the finite size of the Monte Carlo correctly. However, the errors reported from the SVD unfolding seem to give overly large values. The RooUnfold package gives three methods to calc the statistical error: 1. Using the diagonal elements of the covariance matrix, 2. Using the full covariance matrix, and 3. Using the covariance matrix generated from the variation of results in different toy MC tests. I compare methods 1 and 3 below.

Figure 13: This figure shows the (Data-Theory)/Theory curve with the statistical errors obtained using method 1 above. The data and theory mass spectra can be seen here.

Figure 14: This figure shows the (Data-Theory)/Theory curve with teh statistical errors obtained using method 3 above. 1000 Toy MC runs were generated. The data and theory mass spectra can be seen here.

After some investigation, I think the problem with the large statistical errors seen in figure 13 is due to an interplay between how I normalize my simulation and the low number of events in certain partonic pt bins. The weights needed to normalize the partonic pt bins to one another are fixed from bin to bin, but one can apply an overall scale factor which should be arbitrary. Common choices are scaling so that the lowest partonic pt bin has a weight of 1 and all higher pt bins have weights < 1 and the opposite, ie scaling so that the highest partonic pt bin has a weight of 1 and all lower pt bins have weights > 1. Figures 13 and 14 above were made using the simulation which was normalized such that the lowest partonic pt bin had a weight of 1.

The choice of overall normalization should be arbitrary, but I think the normalization I have been using, with the lowest partonic pt bin having a weight of one, is putting too much emphisis on partonic bins with small numbers of events. To test this, I have rerun the simulation using the normalization which sets the weight of the largest pt bin to 1. The columns labeled 'Norm 1 Err' and 'Norm 2 Err' in the table below show the errors obtained from the SVD method using the simulation weighted such that the lowest pt bin has a weight of 1 and the simulation weighted such that the highest pt bin has a weight of 1 respectively. We can see that the errors in the Norm 2 Err column are roughly two orders of magnitude smaller than those in the Norm 1 Err column. In addition, they are very similar to the errors obtained using error method 3 as seen in figure 14.

I was worried that I could shrink the size of the statistical errors by an arbitrary amount just by increasing the size of the overall normalization of the simulation. To make sure this wasn't the case, I reran the simulation using the normalization in the Norm 2 Err column multiplied by 70 thousand, ie, the largest partonic pt bin will have a weight of 70,000. The errors obtained using this normalization are shown in the column labled 'Norm 3 Err'. We can see that the errors in this column are not substantially different than the errors in the 'Norm 2 Err' column showing that a further large increase in the simu normalization does not result in a large decrease in the statistical error.

Table 1: This table shows the yields and errors for various simulation normalizations. See text above for description.

 Bin Yield Norm 1 Err Norm 2 Err Norm 3 Err Method 3 Err 1 0.3294 0.1392 0.00109 0.00100 0.00109 2 0.2051 0.0524 0.00035 0.00032 0.00032 3 0.0745 0.0161 0.00014 0.00013 0.00012 4 0.0240 0.0057 4.6431 e-5 4.312 e-5 3.94 e-5

Groups: