Embedding requirements calculation

Under:

Embedding Requirements Calculation


Here I go through a sample calculation setting out the assumptions used.

Bottom up calculation.

Define desired error
The starting point is to define what statistical error we are aiming for on the correction for a particular pt bin in a spectrum. Obviously there is a contribution from the real data but here we are concerned with the statistical error from the embedding.
Say that we think a 5% error would be nice. That means that 400 reconstructed counts are required if the error is like √N. Actually is is a bit worse than that because the numerator is constructed from a more than one number with different weights so more counts are required,  ~500.
Fold in likely efficiencies
The number that must be generated in that pt bin to achieve this then depends on the efficiency for that bin. For Au+Au minbias a typical efficiency plot might have efficiencies of approximately 5% for pt <1 GeV, 10% for 1 < pt 2 GeV, 15% for 2 < pt < 4 GeV and 40% for pt > 4 GeV. This is for a set of fairly loose cuts close to the default finder ones, loosening at pt=4 because the finder cuts become less stringent at 3.5 GeV.
Therefore we find that the number generated per bin needs to be 10000, 5000, 3000 and 1250 in the pt ranges mentioned. Clearly the low pt part where the efficiency is lowest is driving the calculation.
Effect of choice of bin size
For these numbers to be generated per bin we can ask how many particles per GeV we need. At higher pt we tend to use 500 MeV bins but at lower pt 200 MeV bins are customary, I have even used 100 MeV bins in the d+Au analysis. Choosing 200 MeV bins leads to a requirement for 50000 particles per GeV at low pt etc. This is already looking quite like quite a large number…
Binning with centrality
We embed into minbias events and we'd like to have the correction as a function of the TPC multiplicity (equivalent to centrality). Normally we embed particles at a rate of 5% of the event multiplicity. The 50-60% bin is likely to be the most peripheral that is used in analysis. Here the multiplicity is small enough that we will only be embedding one particle per event. Therefore 50000 particles per GeV requires 50000 events in that particular centrality bin. The 50-60% bin is obviously around one tenth of the total events. I don't think we have a mechanism for choosing which events to embed into depending on their centrality so it means that 500k events per GeV are required.
Coverage in pt
We expect our spectra to go to at least 7 GeV so its seems prudent to embed out to 10 GeV. For Λ these data might also be used for proton feeddown analysis. This means that 5 million events are required!
Coverage in rapidity.
Unfortunately we are not finished yet. We have previously used |y|<1.2 as our rapidity cut when generating even when using |y|<0.5 in the analysis so a further factor of 12/5 is required giving a total of 12 million events per species!

Comments

This number of events is clearly unacceptably large. Matt has mentioned to me that we can do about 150 event per hour so this represents about 80k CPU hours as well as an enormous data volume. Clearly we have to find ways to cut back the request somewhat. Some compromises are listed below.
  • Cut down rapidity range from |y|<1.2 to |y|<0.7 gaining factor 12/7 ≈ 1.7 → 7 million events
  • Settle for a 10% error in a bin rather than 5% gaining factor of 4 → 1.75 million events
  • Hope that efficiency is not a strong function of multiplicity allowing is to combine mult. bins. Gain of factor 2? → 875k events