I checked that the input variables are not affected by the initial polarisation. I also checked the correlations, and they remain the same. So one can use the samples with different polarisation with set of variables I am using for training with much to worry about.
I played a bit more with the parameters in TMVA and I finally was able to use the recoil mass as a discriminating variable replacing the energy of the dilepton (Z). I also added the remaining samples that I did not use last time for the training (now it is 2x more).
I got slightly better results, mostly from optimising parameters and using the recoil mass than from using more samples in the training. Again, the likelihood give the best result, compared with boosted decision trees and neural networks. Neural networks was also better than before by simply adjusting some parameters, namely the number of nodes in the hidden layers. Boosted decision trees method is sinister. I don't understand well that method, but it seems that one needs lots of events and lots of input variables as well. At least now, after adjusting the pruning of the trees, its output from the training and test samples are not so discrepant as before.
Previously with old likelihood (in number of events),
- signal : 3327 -> 2681 (efficiency 80.6%)
- background: 31132 -> 2722 (efficiency 8.7%)
Now with new likelihood + tuning (in number of events),
- signal : 3327 -> 2725 (efficiency 81.9%)
- background: 31132 -> 2386 (efficiency 7.7%)
I will soon try with Mark's variables to see if that improves.
No comments:
Post a Comment