Analysis Summary
Introduction
A future goal of the STAR collaboration is the measurment of flavor separated polarized anti-quark distribution functions. To make these measurments we will be looking at charged leptons - electrons and positrons - arising from the decay of W bosons created in quark anti-quark collisions. One difficulty in making this measurment is the large flux of background hadronic particles, giving a signal to background on the order of 1/1000. The following details my efforts at developing an algorithm which can reject the hadronic background while retaining the signal leptons to achieve a S/B of greater than one-to-one over a significant portion of the observed lepton transverse energy spectrum which runs from roughly 20-50GeV.
Basic Philosophy
Discrimination between leptons and hadrons is posible because of the different processes giving rise to signal and background events as well as different showering properties of leptons and hadrons in the EEMC. Hadronic events tend to be dijets, meaning these events will deposit two blobs of energy located 180 degrees away in azimuth from eachother. Leptonic events, on the other hand, arise from the decay of W bosons which produce a charged lepton and a neutrino. The neutrino is not detected so only one blob of energy is deposited in the detector. This difference allows for the application of an away-side cut which will veto events having significant energy 180 degrees away from the candidate lepton. Hadrons and leptons also behave differently inside of the EEMC with hadrons tending to produce larger and wider showers as compared to leptons due to collisions with nuclei. This difference in shower behavior means isolation cuts on the energy around a candidate lepton can be effective. Finally, the EEMC is roughly 21 electron radiation lengths deep and only one hadron radiation length deep meaning that much more hadronic energy will escape the back of the detector, allowing for cuts based on the amount of energy leaving the detector. Below are pictures from starsim showing the evolution of hadronic and leptonic showers in the EEMC:
Fig 1: Picture of a 30GeV charged pion going into the EEMC generated using starsim.
Fig 2: Picture of a 30GeV electron going into the EEMC generated using starsim.
As described above, the algorithm discriminates between leptons and hadrons in three basic ways: looking at the number of tracks and energy around candidate leptons, vetoing events with too many tracks or too much energy 180 degrees away in azimuth from the candidate lepton, and by comparing shower evolution in the EEMC. Another important aspect of the discrimination algorithm is the trigger patch. The trigger patch is always taken to consist of the tower with the highest energy - which is also taken as the position of the candidate electron - and all adjacent towers, usually yielding a 3x3 patch. This trigger patch is the area of the EEMC where shower evolution is investigated. The investigation of shower evolution is possible in the EEMC due to the five separate readout layers: the two pre-shower layers, the two shower maximum detectors (SMDs) and the post-shower layer. The pre-shower layers are located at the front of the detector where few leptons will have started to shower, the SMDs are located five out of 24 layers deep and are positioned at the depth where there is maximum shower energy deposition, and the post-shower layer is at the back of the detector where most lepton showers will have died out. By looking at the energy deposited in the separate layers, one can get an idea of the longitudinal shower development inside the EEMC. One can also see the transverse shower development by looking at the number of hit strips in the SMD layers. Many combinations of EEMC quantites were tried over the course of developing the discrimination algorithm, but the best discrimination by far was achieved by looking at the ratio of the energies in the seven highest adjacent SMD strips under the trigger patch to the energy of all SMD strips under the trigger patch. Less effective but still usefull EEMC cuts include the ratio of the energy in the two trigger patch towers with the highest energy to the energy in the full trigger patch and the ratio of the energy in the post-shower layer to the energy in the full trigger patch. The away-side and isolation cuts provide powerful discrimination as well and are largly independent of the EEMC cuts listed above.
Code Versions
The discrimination code has gone through many versions as new cuts were tested, tracking methods were improved, and other improvements were made. Below is a brief discription of each version as well as the source code for future reference.
V1.0:
Version 1.0 is the first version of the discrimination code designed to work with the Pythia simulations run at MIT. This version borrowed heavily from earlier code designed to work on single thrown lepton and hadron events. The code used to access the EEMC information was taken directly from this earlier work. In addition to the EEMC anlysis this version includes code to access tracking information from the MuDst files as well as a function to determine if a track passed close to the tower with highest energy. Also included was a primative function which looked at the awayside energy only in the endcap. This code looked at approximately eleven different cuts and displayed what the trigger patch transverse energy spectrum would look like after each one was applied as well as what the spectrum would look like when several different combinations of cuts were applied.
V1.1:
Version 1.1 follows the same basic pattern as version 1.0 detailed above. The major difference is in the trigger criteria used to determine wether an event should be analysed. In version 1.0, all events needed to have a trigger patch ET greater than 15GeV in order to be processed further. In addition to the 15GeV condition, version 1.1 requires that all events have a found vertex and that that vertex be found in within ten centimeters of z=-60. (The simulations were generated with the interaction point at -60 cm so that tracks with large etas would pass through more TPC volume). The next significant change was the inclusion of a function which calculated the transverse energy deposited in a tower taking into accout the z position of the vertex as ET value for a particle originating at z=-60 can be significantly different from the ET value gotten assuming the particle originated at z=0. Minor changes include the addition of a function which allows the setting of the energy threshold needed to consider a tower hit, a function which gives the ET weighted position of the trigger patch, and the investigation of several new cut quantities.
V2.0:
Version 2.0 differs from the 1.x versions in that it implements functions allowing isolation cuts, both same side and away side, utilizing tower energies, track momenta, and barrel information. Simulations done by Les Bland showed that these kinds of cuts had the potential to provide nearly two orders of magnitude in background reduction. These functions add up the transverse energy/momentum of all towers/tracks which fall within a user set region. For the same side isolation cut, this region is a circle with user set radius centered on the high tower. For the away side isolation cut, the region is a slice in phi with a user set width around the line which is 180 degrees away in azimuth from the high tower. In addition to the isolation cut functions, this version addopts the new cuts from version 1.1 and adds histograms exploring various radii for the isolation cuts. The code to calculate the transverse energy was also changed again, this time to set the 'center' plane of the tower to the position of the SMD plane as opposed to the actual physical center because we expect the most shower activity at the SMD depth.
V2.1:
Version 2.1 is very similar to version 2.0 and contains only minor changes. One concern with previous versions was that events occuring at high eta did not have reconstructed tracks because of poor TPC tracking in this region and thus it was hard to get a good idea of how well cuts using tracking were doing. To investigate this problem, several isolation cuts and histograms were made with the condition that the hightower not be located in etabins 1-5 - which excluded high eta events - or etabins 11&12(because of edge effects around the transition from endcap to barrel). In addition to this bin restriction investigation, new histograms were made to study the effects of moving the trigger patch ET threshold from 15 to 20GeV on the cut quantites from version 2.0. Other small changes included the addition of a crude function to determine approximately how many electrons/positrons were going into the endcap and a modification to the away side isolation function to read out the track, barrel, and endcap energies separately.
V2.2:
Version 2.2 adds two functions which provide a different way to carry out the same side isolation cuts. The first function counts the number of tracks above a certain threshold which cross the endcap within some radius of the high tower and also calculates the transverse displacement between the track and the high tower if there is one and only one track in the radius. This one and only one track scheme is flawed and is modified in later versions. The second function calculates the transverse energy deposited in the barrel and endcap towers within a certain radius. The idea behind the new isolation cuts was that QCD events should have many tracks around the high tower while W decay events should have few, maybe only one. It was also thought that events with one and only one track in the isolation radius may only be displaced some slight amount from the high tower and that this could give good lepton hadron discrimination. There are many new histograms exploring these ideas.
V2.3:
Version 2.3 makes many changes to how the cuts are organized. The trigger patch ET spectrum is shown after each individual cut as well as after each cut in succession. The cuts in this version were chosen from the cuts in the previous version which showed the best discrimination power, all other cuts were removed. In addition to re-organizing the cuts, new trigger conditions were included. Along with the trigger conditions from previous versions, events now must have a trigger patch ET greater than 20GeV, have the high tower be in etabins 6-11, and have the ET weighted trigger patch eta position be greater than 1.7 to be processed further.
V2.4:
Version 2.4 is very similar to version 2.3 in terms of cuts. Cuts on the away side energy and the displacement between tracks and the high tower have been added. This version of the code has also been cleaned up significantly, with many extraneous histograms and function calls being removed. This version also allowed for overall weighting of the event samples being processed. This allows for the scaling of all pythia samples to the same integrated luminosity. This method of weighting the samples was cumbersome because the code must be recompiled for each seperate simulation batch and if a mistake were made, the whole pythia sample would need to be rerun. In the future, this method is dropped in favor of a script which weights the batches while merging the seperate event files.
V2.5:
Version 2.5 is the last version of the code to use cuts based on reconstructed tracks from the TPC. It is also the version used to produce plots for several presentations and proposals. As such, it has many histograms which were specifically requested for those presentations, chief among them histograms showing the effects of cuts on the trigger patch ET spectra for electrons and positrons separately. In anticipation of using GEANT tracking in future versions a function was created which would count tracks going into a isolation region using the GEANT record. This function looped over all vertices within three centimeters of the primary vertex and counted all the tracks above a certain threshold which made it into the isolation region. This method often double-counts tracks and is modified in later versions. In addition to the new functions and histograms, two glitches in the code have been fixed in this version. The first is the addition of an ET scaling factor which single particle simulations had shown were needed to make the thrown electron pt and detector response agree, the value used is 1.23. The second correction fixed a problem in calculating the phi displacement between two objects in the endcap. Because the endcap coordinate system splits the detector into two halves and maps one 0 -> 180 and the other 0 -> -180, just taking the difference between phi coordinates will occasionaly lead to the wrong actual displacement. To rectify this, the difference in phi is now calculated as acos(cos(phi1 - phi2)) instead of just (phi1 - phi2). In this version the cuts remain the same as in version 2.4.
V3.0:
Version 3.0 uses GEANT for tracking and vertex finding information. Looking at the results from previous code, it was thought that tracking efficency and vertex finding were still too poor. It was decided to use GEANT tracking in anticipation of the improved tracking which would be available with the FGT upgrade. With the perfect tracking from GEANT, it was possible to explore the upper limit of how much discrimination track cuts could provide. Because of the perfect tracking, the volume of the detector looked at would not need to be restricted as much. Thus, the trigger conditions were changed from version 2.5 to eliminate the restriction that the ET weighted eta position of the trigger patch must be less than 1.7. The restriction on the etabin of the high tower was also relaxed so that it could be located in bins 2-11 (bins 1 and 12 were still excluded to reduce edge effects). The isolation cuts are now handled by four functions, two calculate the energy deposited in the calorimeters for the same side and away side cuts, and two count the number of tracks above a certain pt threshold going into the same side and away side regions. The track counting functions still have the double counting problem in this version. The last major change to this code is the way in which the cuts are organized and displayed. In this version the cuts are divided into three groups, the initial trigger cuts that all events must pass, the isolation cuts and the EEMC cuts. Each cut quantity is displayed after the trigger cuts as usual, but now each cut is displayed after the trigger cut and the isolation cuts have been applied and each cut is displayed again after all other cuts have been applied so that in the end, each cut is displayed three separate times. This was done to see which cuts where independent of each other.
V3.1:
Version 3.1 of the code solves the double counting problem that the previous track counting functions had. In this version the track counting is done recursively. The GEANT track and vertex infromation is set up as a network of connected nodes. Each node is a vertex and represents a particle decay and the lines coming out of each node represent the resultant particles of this decay. These lines will then connect to other nodes if that particle decays. The track counting function loops through all the particle lines coming from the primary vertex and asks where the next vertex connected to that line is, if there is no other vertex or the vertex is located greater than three centimeters away from the primary vertex, the particle is considered stable and is counted (if it is heading into the proper region). If the next vertex is less than three centimeters from the primary vertex however, the function is called again and loops over the tracks coming from this secondary vertex and the process is repeated until all tracks have been looped over. In this way, the code avoids counting the track of a particle which decays along with the tracks of its daughter particles. In addition to the double counting problem the values of the cuts were changed in this version. Version 3.0 threw away too many signal events so version 3.1 used the same set of cuts but just loosened the values where the cuts were made so they did not throw away as many events.
V3.2:
Version 3.2 is very similar to version 3.1, the major differences are the values used for the cuts. The quantities cut on are the same as in version 3.0 and 3.1, but now the values have been tightened slightly compared with version 3.1, so now each cut throws away around 2 or 3% of the signal. In addition to this tightening, the away side energy and track quantities were plotted against the trigger patch ET in a 2-D plot. This allowed for cuts which rejected much more background at low pt than was possible before with the same 1-D cuts.
V3.21:
Version 3.21 is almost exactly the same as version 3.2, the major difference is that this version introduces an ineffiecency into two of the track cuts. The cuts on the number of charged tracks above .5 and 5GeV going into the same side isolation circle both have large numbers of events with zero tracks in the QCD background sample and very few events with zero tracks in the signal sample. It is likely that many of these neutral tracks are Pi0's. With GEANT tracking, these Pi0's can be identified with 100% efficiency, but for real data, material in front of the detector will cause many of the Pi0's to convert giving rise to charged tracks. This conversion process will make the zero charged track cuts less efficient. To explore the effects of these conversions, a random number generator was used to pass 30% of events with zero charged tracks in both cuts. The only other change from version 3.2 was to move the 2-D histograms showing the trigger patch ET vs the cut quantites after all other cuts but themselves were applied so that they would be incremented properly.
Results
Detailed results from the latest version of the discrimination code can be found on the pages linked to in sections 3.2 and 3.21 but the major points as well as some summary plots will be shown here.
Fig 3: This plot shows the final trigger patch ET spectra after all cuts have been applied. The QCD background is in black and the W signal is in red. The plot on the left is from v3.2 and the plot on the right is from v3.21.
Fig 4: This plot shows the signal-to-background ratio for versions 3.2 and 3.21 after all cuts have been applied.
Fig 5: This table shows the effectivness of each cut for version 3.2.
Cut | QCD Events Cut | Signal Events Cut |
Iso Cut | 235 of 754: 31% | 18 of 1151: 2% |
Small Iso Track | 82 of 536: 15% | 11 of 1131: 1% |
Away ET Cut | 1300 of 1754: 74% | 9 of 1135: 1% |
Away Track Cut | 924 of 1378: 67% | 45 of 1165: 4% |
Big Iso Track | 3542 of 3997: 89% | 4 of 1124: <1% |
2 Highest Towers | 63 of 521: 12% | 21 of 1141: 2% |
7 U Strips | 527 of 1195: 44% | 31 of 1175: 3% |
7 V Strips | 579 of 1195: 48% | 33 of 1175: 3% |
Hit U Strips | 7 of 468: 1% | 10 of 1138: <1% |
Hit V Strips | 8 of 468: 2% | 10 of 1138: <1% |
PS patch/Full patch | 151 of 606: 25% | 10 of 1130: <1% |
Fig 6: This table shows the effectivness of each cut for version 3.21.
Cut | QCD Events Cut | Signal Events Cut |
Iso Cut | 739 of 2852: 26% | 17 of 1107: 2% |
Small Iso Track | 1042 of 2931: 36% | 11 of 1089: 1% |
Away ET Cut | 3983 of 5897: 68% | 15 of 1093: 1% |
Away Track Cut | 3461 of 4897: 71% | 63 fo 1121: 6% |
Big Iso Track | 3310 of 5171: 64% | 4 of 1080: <1% |
2 Highest Towers | 110 of 2024: 5% | 19 of 1097: 2% |
7 U Strips | 1194 of 3620: 33% | 30 of 1131: 3% |
7 V Strips | 1323 of 3620: 37% | 32 of 1131: 3% |
Hit U Strips | 27 of 1966: 1% | 10 of 1096: 1% |
Hit V Strips | 44 of 1966: 2% | 10 of 1096: 1% |
PS patch/Full patch | 189 of 2095: 9% | 10 of 1088: 1% |
Figures 5 and 6 show the effect of each cut after all other cuts have been applied. (The exception is for cuts on U and V plane quantites, in which case all other cuts but the similar cut on the other plane are applied). For each cut, the number of events which survived the other cuts and the number of those events which the cut removes are shown. So for example, looking at the iso cut in figure three, 754 background events survive the thirteen other cuts and the iso cut removes 235 of these surviving events. This can be taken as a way to measure how independent a cut is from all the others. The tables show that with perfect tracking, the cut on the number of charged tracks above 5GeV is by far the best cut, but with the 30% ineffieciency added in the effectiveness of this cut is reduced so that it is around parity with the away side cuts.
Conclusions
As the figures above show, the discrimination code achieves a signal-to-background ratio of greater than one-to-one over a significant portion of the lepton transverse energy range, indicating that background should not be an insurmountable barrier to preforming the W analysis at STAR. These results were obtained using GEANT and any analysis of real data will need to monitor vertex finding and tracking efficency to ensure they do not degrade the effectiveness of the cuts too much. The addition of the FGT should help with these issues and it is likely that the code will need to be optimized to take advantage of the improved tracking of the FGT. Finally, this analysis is based on a powerful discrimination code which can be refined and modified to suit future analyses.