bouchet's blog

How to compile the code:

  ------------------------

  /home> cd TMVA

  /home/TMVA> source setup.[c]sh     // includes TMVA/lib in your lib path (I used the .sh script in my mac)

  /home/TMVA> cd src                              

  /home/TMVA/src> make               // compile and build the library ../libTMVA.1.so

to run :

  How to run the code as ROOT macro: // training/testing of an academic example

  ----------------------------------

  /home/TMVA> cd macros


  --- For classification:

  /home/TMVA/macros> root -l TMVAClassification.C                       // run all standard classifiers

  /home/TMVA/macros> root -l TMVAClassification.C\(\"LD,Likelihood\"\)  // run LD and Likelihood classifiers

Here I've tried the first macro : it took ~ 10-15 mn to run because it uses on the simple example ALL the methods

preparation of files

signal : 50k single D0 through BFC chain and analyzed with reco. code

At the end, I have a Tree with D0 reconstructed in 1.80<M<1.92 , and all the variables I would like to cut on

2. background : for now 500k hijing events, reconstructed and analyzed with same codes/cuts

At the end, I have a Tree with (Kpi) pair like sign in 1.80<M<1.92 (same range)

Below are the invariant mass of the selection :

Fig 1. : signal D0

Fig 2. : background

Test : I tried with the following variables as input :

signed decay length
error on signed decay length
Pt of D0
CosPointing
cosine(theta*)
dca3D btw tracks at secondary point (from Helix swimming)
dca3D of daughters to primary vertex

The classifications methods tested are :

neural network (MLP)
Fisher discriminant
Likelihood estimation

Some plots are here

The program finishes with this :

--- Found directory: Method_Likelihood

--- Classifier: Likelihood

--- Found directory: Method_Fisher

--- Classifier: Fisher

--- Found directory: Method_MLP

--- Classifier: MLP

--- ==================================================================================================

--- Classifier   (  #signal, #backgr.)  Optimal-cut  S/sqrt(S+B)      NSig      NBkg   EffSig   EffBkg

--- --------------------------------------------------------------------------------------------------

--- Likelihood:  (     1000,     1000)       0.0514      22.3924   997.549  987.0203   0.9975    0.987

---     Fisher:  (     1000,     1000)      -0.3526      22.3708      1000  998.1925        1   0.9982

---        MLP:  (     1000,     1000)      -1.1194       22.362      1000    999.77        1   0.9998

--- --------------------------------------------------------------------------------------------------

UPDATE : 5/8

background and signal have increased statistic : see plots here
to recall, signal are (Kpi) pairs taken from single D0, reconstructed through BFC chain and analyzed with MuKpi (unlike sign for daughters) ; background are pairs from hijing Au+Au @200 central event , econstructed through BFC chain and analyzed with MuKpi (same sign for daughters)
now TMVA takes almost all entries of the D0Tree : i had to remove some because it cannot compute with +35 variables
The Fisher and BDT classifiers have been used

I have remove :

the sign of daughters (assumption is done for sign(kaon)<0 and sign(pion)>0
the signed decay lengths and errors of daughters from the secondary vertex

The correlation matrices are in these pdf : signal , background

The results of the classification of variables for both method are below :

signal/background and significance for Fisher :

signal/background and significance for BDT :

The correlations between the initial variables and the classifier are written in a files ~/TMVA/macros/TMVAClassification_BDT.weights.xml.

I then use a macro (TMVAClassificationApplication.C) to read a file (embedding, real data) and also the weights to fill another tree that contains also the classifier variable, as the idea is to vary the classifier value (from previous plots, we can see that -0.3< mvaBDT< 0.1 and -0.4< mvaFisher< 0.6

results using embedding, simulation and real data : html and pdf
correlation plots using embedding, simulation and real data : html and pdf

Study : # of variables, # of classifiers to use (5/16)

Plots above were done with signal and background samples containing 30 variables. This test is to look at some efficiency # of variables used to train the classifiers

# of variables = 4 : slength, dslength, cos(theta*), cos(Pointing)
# of classifiers : BDT, Likelihood, Fisher, CutsPCA, MLP, RuleFit (include plots here)

some observations

Fisher classifier : optimal for gaussian distributed variables with linear correlation so correlated variables should not be a problem
Get better results (better ROC) for both Fisher and BDT classifiers when including all variables from D0Tree
Try a different background sample for training (from embedding) :

because of the silicon hits distribution different in simulation and embedding , TMVA thinks it's a good variable to discriminate (ROC is better)
but results of inv. mass are not better

TMVA version to use:

with root_5_20 (current version on my laptop), use TMVA-v4.0.3

sourceforge for TMVA version : http://sourceforge.net/projects/tmva/files/TMVA/

with root_5_28, better use the latest version : TMVA-v4.1.0

Groups:

STAR Protected

bouchet's blog
Login or register to post comments

The STAR experiment

bouchet's blog

STAR Protected

User login

Navigation

Group notifications

TMVA : test