- bouchet's home page
- Posts
- 2016
- 2015
- December (1)
- November (3)
- October (2)
- September (2)
- August (2)
- June (2)
- April (5)
- March (2)
- February (3)
- January (2)
- 2014
- December (2)
- November (2)
- October (3)
- September (2)
- August (3)
- July (1)
- June (3)
- May (6)
- April (6)
- March (1)
- February (2)
- January (1)
- 2013
- December (2)
- November (3)
- October (3)
- September (4)
- August (1)
- July (1)
- May (4)
- April (6)
- March (4)
- February (3)
- 2012
- 2011
- December (2)
- November (2)
- October (4)
- September (1)
- August (2)
- July (6)
- June (2)
- May (3)
- April (3)
- March (2)
- 2010
- 2009
- December (2)
- November (1)
- October (3)
- September (1)
- August (1)
- July (1)
- June (2)
- April (1)
- March (2)
- February (2)
- January (1)
- 2008
- My blog
- Post new blog entry
- All blogs
TMVA : test
first test to use TMVA.
- from the README
How to compile the code: ------------------------ /home> cd TMVA /home/TMVA> source setup.[c]sh // includes TMVA/lib in your lib path (I used the .sh script in my mac) /home/TMVA> cd src /home/TMVA/src> make // compile and build the library ../libTMVA.1.so
- to run :
How to run the code as ROOT macro: // training/testing of an academic example ---------------------------------- /home/TMVA> cd macros --- For classification: /home/TMVA/macros> root -l TMVAClassification.C // run all standard classifiers /home/TMVA/macros> root -l TMVAClassification.C\(\"LD,Likelihood\"\) // run LD and Likelihood classifiers
Here I've tried the first macro : it took ~ 10-15 mn to run because it uses on the simple example ALL the methods
- preparation of files
- signal : 50k single D0 through BFC chain and analyzed with reco. code
At the end, I have a Tree with D0 reconstructed in 1.80<M<1.92 , and all the variables I would like to cut on
2. background : for now 500k hijing events, reconstructed and analyzed with same codes/cuts
At the end, I have a Tree with (Kpi) pair like sign in 1.80<M<1.92 (same range)
Below are the invariant mass of the selection :
Fig 1. : signal D0
Fig 2. : background
- Test : I tried with the following variables as input :
- signed decay length
- error on signed decay length
- Pt of D0
- CosPointing
- cosine(theta*)
- dca3D btw tracks at secondary point (from Helix swimming)
- dca3D of daughters to primary vertex
The classifications methods tested are :
- neural network (MLP)
- Fisher discriminant
- Likelihood estimation
Some plots are here
The program finishes with this :
--- Found directory: Method_Likelihood --- Classifier: Likelihood --- Found directory: Method_Fisher --- Classifier: Fisher --- Found directory: Method_MLP --- Classifier: MLP --- ================================================================================================== --- Classifier ( #signal, #backgr.) Optimal-cut S/sqrt(S+B) NSig NBkg EffSig EffBkg --- -------------------------------------------------------------------------------------------------- --- Likelihood: ( 1000, 1000) 0.0514 22.3924 997.549 987.0203 0.9975 0.987 --- Fisher: ( 1000, 1000) -0.3526 22.3708 1000 998.1925 1 0.9982 --- MLP: ( 1000, 1000) -1.1194 22.362 1000 999.77 1 0.9998 --- --------------------------------------------------------------------------------------------------
- UPDATE : 5/8
- background and signal have increased statistic : see plots here
- to recall, signal are (Kpi) pairs taken from single D0, reconstructed through BFC chain and analyzed with MuKpi (unlike sign for daughters) ; background are pairs from hijing Au+Au @200 central event , econstructed through BFC chain and analyzed with MuKpi (same sign for daughters)
- now TMVA takes almost all entries of the D0Tree : i had to remove some because it cannot compute with +35 variables
- The Fisher and BDT classifiers have been used
I have remove :
- the sign of daughters (assumption is done for sign(kaon)<0 and sign(pion)>0
- the signed decay lengths and errors of daughters from the secondary vertex
The correlation matrices are in these pdf : signal , background
The results of the classification of variables for both method are below :
signal/background and significance for Fisher :
signal/background and significance for BDT :
The correlations between the initial variables and the classifier are written in a files ~/TMVA/macros/TMVAClassification_BDT.weights.xml.
I then use a macro (TMVAClassificationApplication.C) to read a file (embedding, real data) and also the weights to fill another tree that contains also the classifier variable, as the idea is to vary the classifier value (from previous plots, we can see that -0.3< mvaBDT< 0.1 and -0.4< mvaFisher< 0.6
- results using embedding, simulation and real data : html and pdf
- correlation plots using embedding, simulation and real data : html and pdf
Study : # of variables, # of classifiers to use (5/16)
Plots above were done with signal and background samples containing 30 variables. This test is to look at some efficiency # of variables used to train the classifiers
- # of variables = 4 : slength, dslength, cos(theta*), cos(Pointing)
- # of classifiers : BDT, Likelihood, Fisher, CutsPCA, MLP, RuleFit (include plots here)
some observations
- Fisher classifier : optimal for gaussian distributed variables with linear correlation so correlated variables should not be a problem
- Get better results (better ROC) for both Fisher and BDT classifiers when including all variables from D0Tree
- Try a different background sample for training (from embedding) :
- because of the silicon hits distribution different in simulation and embedding , TMVA thinks it's a good variable to discriminate (ROC is better)
- but results of inv. mass are not better
TMVA version to use:
with root_5_20 (current version on my laptop), use TMVA-v4.0.3
sourceforge for TMVA version : http://sourceforge.net/projects/tmva/files/TMVA/
with root_5_28, better use the latest version : TMVA-v4.1.0
- bouchet's blog
- Login or register to post comments