Bristol-Edinburgh Z Higgs Analysis: 2009

Friday, March 13, 2009

Electron sample with 250fb-1

Fit results for 10x10 bins:
r_bb=0.95 +/- 0.056 (5.93%)
r_cc=1.3 +/- 0.55 (41.82%)
r_gg=1.2 +/- 0.52 (44.51%)

Errors from the toy study using chi2 ignoring less than 7:
r_bb 0.057 +/- 1.7*10-3
r_cc 0.49 +/- 6.4*10-2
r_gg 0.52 +/- 4.9*10-2
The error on the error is taken from the sigma of a gaussian fit on a plot of the errors, with the error taken from the mean (if you see what a mean - no pun intended).

Errors from the toy study of a toy set to its source using the likelihood function:
r_bb 0.038 +/- 5.8*10-4
r_cc 0.36 +/- 1.4*10-2
r_gg 0.38 +/- 6.9*10-3

The numbers mostly match Roberval's well, although the likelihood r_cc and r_gg are a little lower here.

Plots, same as yesterday:
r_bb
r_cc
r_gg

Pulls:
chi2
likelihood
The chi2 is off slightly. I wasn't going to look into it too much unless anyone thinks it's a cause for concern?

Also, the fit results:
chi2

Thursday, March 12, 2009

Electron sample with 500fb-1

I'm running the 250fb-1 numbers at the moment, here are the results for 500fb-1.

The "Toy study using limited templates" is the chi2 fit ignoring less than 7 entries using 5000 tests. I plotted the error, fitted a gaussian to it and that gives the value and the error bars.
The "Toy study error using infinite statistic templates" is the likelihood fit of a toy sample to its source. Again 5000 tests, with a gaussian fitted to the errors.
The "Error from fully simulated data" is the error from the chi2 fit ignoring less than 7 entries.
r_bb
r_cc
r_gg

The fully simulated data matches the toy study for limited statistics and the values for 10x10 bins match Roberval's 500fb-1 numbers quite well. When I post the results for 250fb-1 I'll add the numbers at 10x10 bins too.

The pulls for both of the toy studies are below. A slight low bias for the chi2 but other than good.

Likelihood
Chi2

Results scaled to 250 fb^-1 - muon channel

By simply scaling the data histograms, the central values did not change and the errors are larger, as expected

Likelihood fit

r_bb = 1.013 ± 0.038
r_cc = 0.81 ± 0.46
r_gg = 0.98 ± 0.45

Chi2 fit

r_bb = 1.012 ± 0.044
r_cc = 0.87 ± 0.54
r_gg = 0.93 ± 0.51

The contribution from the MC to the errors in the parameters should be the same as in the 500fb^-1 case because on the MC side nothing changed. The pull distributions for pseudo-experiments with MC only give contributions of 46% in the estimated error arising from the MC finiteness, yielding exactly the same errors as before.

r_bb = 1.012 ± 0.039 (data) ± 0.020 (MC)
r_cc = 0.84 ± 0.48 (data) ± 0.25 (MC)
r_gg = 0.95 ± 0.45 (data) ± 0.23 (MC)

The statistical errors from the data in the chi2 fit are very similar to the ones from the likelihood.

It seems that with 250fb^1 of luminosity, even combining the different channels, the measurement of the branching ratio of the Higgs boson will be quite poor.

Actions from Meeting

It was agreed that the method was now in place, and the muon numbers look good for 500 fb^-1.

The actions required to complete our contribution to the LOI:

Re-run muon numbers for 250 fb^-1 (Roberval)
Re-make figure with flavour-likelihood distributions with axes labels swapped and all text enlarged (Hajrah).
Run final electron numbers for 250 and 500 fb^-1 (Mark).
Final edit and combination of numbers (Joel).

Results for muon channel - Update

Fit with Poisson Likelihood

r_bb = 1.013 ± 0.027
r_cc = 0.81 ± 0.33
r_gg = 0.98 ± 0.32

With the likelihood method contributions from a finite Monte Carlo sample are not considered.

Generating pseudo-experiments of the data assuming the data is poisson distributed yields pull distributions of gaussian shape with mean O(10^-2) and rms ~1.

Fit with Chi2

r_bb = 1.012 ± 0.033
r_cc = 0.84 ± 0.41
r_gg = 0.95 ± 0.39

In this case, statistical fluctuations arising from the finite MC sample are taken into account. The poisson to gaussian approximation is only valid if the number of events in the bin is larger than 5. We then considered in the fit only bins with at least 7 entries in the data.

To check that this method is valid and consistent with the likelihood method, pseudo-experiments of the data and the MC were generated and the pull distributions of the fit parameters were calculated considering that the "true" value of the parameters are the ones obtained in the likelihood.

The mean (rms) of the pull distributions are 1.0 (-0.08) , 0.99 (-0.08) and 1.0 (-0.16) for r_bb, r_cc and r_gg, respectively. Notice that r_gg is slightly biased.

In order to estimate the contribution of the finite MC sample to the error of the fit, the pull distributions of the fit parameters were obtained using pseudo-experiments generated for the MC only. The pull distributions are then gaussians with width less than one. That is the fraction of the fit error that is purely due to the MC statistical fluctuations. We obtained widths of 0.6 for the three parameters. Splitting the error contributions from the data and from the MC we obtained:

r_bb = 1.012 ± 0.027 (data) ± 0.020 (MC)
r_cc = 0.84 ± 0.34 (data) ± 0.25 (MC)
r_gg = 0.95 ± 0.31 (data) ± 0.23 (MC)

Ignoring small bins

The pulls from the toy study:
r_bb
r_cc
r_gg

Results from the full data:
r_bb
r_cc
r_gg

Tuesday, March 10, 2009

Likelihood cut variables

Some plots for likelihood variables I'm only just getting around to putting up:

Jet energy difference
recoil mass
Z candidate cos theta
thrust cos theta

Results for muon channel

Results from the fits:

The fit was done minimising a chi^2 function where bin containing less than 10 events were discarded. The fit procedure considered effects from statistical fluctuations of the MC samples used in the fit. The results are shown below: The first error is the statistical error from the data, and the second error is the contribution from MC which was estimated (about 60%) using toy MC.

r_bb = 1.021 ± 0.027 ± 0.020
r_cc = 0.89 ± 0.34 ± 0.25
r_gg = 0.91 ± 0.31 ± 0.23

More details here.

Event reconstruction and selection

Two muon candidates with opposite charges, identified using neural networks, momentum > 20 GeV and isolated (no track in a cone of 5o around the candidate) were combined into a Z boson candidate. If more than one Z candidate is found, the one with mass closest to the nominal (PDG) mass is considered.
After the di-lepton from Z decay is identified, the other remaining particles are force into two jets. The di-jet system is combined in a Higgs candidate.
Only events with at least 25 reconstructed particles are taken.
70 < M_ll < 110 GeV
117 < M_rec < 150 GeV
100 < M_jj < 140 GeV
| cos theta_ll | < 0.9

where M_ll is the mass of the di-lepton, M_jj is the mass of the di-jet, M_rec is the recoil mass and theta_ll is the polar angle of the di-lepton.

less than 10 data ignored

Same as before but with bins with less than 10 data ignored. Only 500 test this time though.

Fitting two separately poisson generated samples together:
r_bb
r_cc
r_gg

Fitting a poisson generated sample to its source:
r_bb
r_cc
r_gg

Sunday, March 8, 2009

Results for different bin ranges

The fits for 2x2 to 20x20 bins are done, pull means and sigmas are show below.

Fitting two separately poisson generated samples together:
r_bb
r_cc
r_gg

Fitting a poisson generated sample to its source:
r_bb
r_cc
r_gg

The chi2 works well for the first step, but then overestimates the error in step 2. That's not surprising since it's still using the errors from the templates, but if you don't (red line) the mean has an increasing bias as the binning increases.
The likelihood on the other hand under estimates the errors for step 1 but works well for step 2.

More fit studies

I had a look at the pulls for 5000 tests of fitting together two toy samples at 10x10 binning. Results are below, I'm also running it over 2-20 bins now but it's taking a while.

Note that I defined the pull differently to the normal convention so a positive mean means that the results are too low. I'll correct that in any further plots.

r_bb, different chi2 variants
r_bb, different likelihood variants
r_cc, different chi2 variants
r_cc, different likelihood variants
r_gg, different chi2 variants
r_gg, different likelihood variants

Barlow-Beeston has a weird double hump which I don't understand, again this is probably my implementation, or the Poisson approximation breaking down. I wouldn't have thought it's the approximation because it's not as bad for r_bb where the number of bins in the peak is not much less than the total entries.

The only chi2 fit without a bias is the one that uses the r_xx values in the error calculation, which "gives nonsense results". We could potentially do the fit recursively using a constant r_xx from the previous iteration, as Klaus suggested. Is there time to implement it though?

The likelihood gives reliable central values, but underestimates the error (the pull sigma is high). I thought about trying the equation Roberval posted last week, but the only difference is the d*Log[d]-d term which will just give a constant shift - hence the fit values and errors will be the same. I could modify the change for which Minuit calculates the errors (the "Up()" method) using (another) toy study, but without a mathematical reason for changing it is that justified?

Friday, March 6, 2009

Fitter comparison

Here's a comparison of the fitters that I've currently got coded up. The fitters that are used are:

"chi2" - chi2 using the template errors and the r_xx values in the error calculation
"chi2NoRValues" - same as above but without the r_xx values in the error as pointed out by Klaus
"chi2NoErrorFromTemplates" - only using the error from the data in the chi2 error
"likelihood" - simple likelihood fit, but not quite the same as Roberval was using from the Barlow-Beeston paper. It's [d*ln(f)-f - d*ln(d)+d] which I got from the NIM paper I mentioned a while ago (NIM 221 (1984) 437-442).
"BarlowBeeston" - The Barlow Beeston algorithm, as best as I've managed to implement it.

I've also tried it ignoring bins where the data is less than 10. I've not had a proper think about whether this is a fair thing to do yet, but the results seem good.

r_bb
r_bb ignoring bins where the data is less than 10

r_cc
r_cc ignoring bins where the data is less than 10

r_gg
r_gg ignoring bins where the data is less than 10

I also had a look at what they're like when cheating the background contribution, and it's pretty much a small constant shift towards the true value. It's not huge though so I haven't posted the plots - the efficiency of the current selection is quite reliable.

I'm not that keen on using the chi2 anymore because the results aren't very good without the r_xx values in the errors. Barlow-Beeston is still giving some strange results, not that visible here but the pulls on the toy study look strange (I'll post them once I've done a bit more work on that). I think it's more to do with my (currently) buggy implementation more than the method. As Roberval pointed out though, there is the approximation of Binomial errors to Poisson which is only valid when the entries in a bin are much less than the total entries.
The likelihood looks okay, so Im edging towards that. All of the r_bb results have a bias though, which I don't understand.

Everything is improved when cutting out bins with less than 10 data, so I guess a final decission depends on if we agree that is a fair thing to do. I'll also try each method with the toy study - multiple test might shed a bit more light.

Monday, March 2, 2009

N-1 Plots

Some N-1 plots from the electron sample when using the cuts Hajrah suggested on Thursday. The signal over sqrt(signal+background) is overlaid, with the "star" series showing the s/sqrt(s+b) if you take everything to the right of it and the "dot" everything to the left. Both s/sqrt(s+b) use the right axis while the histograms use the left. It should make sense if you look at the plots.

Di-electron mass
Di-jet mass
Recoil mass
cos(theta) of the Z

I didn't bother with a plot for "number of electrons" because it doesn't show anything.

It looks like the lower di-electron mass could go up a bit, everything else looks spot on. I'm not suggesting we should necessarily change it, keeping the muon and electron channel cuts is probably more important for now and the likelihood cut should take up the slack. As Roberval mentioned though, the 2 fermion background might require us to tighten the di-lepton mass cut.

For reference here are the cuts:
number of electrons >= 2
70 < di-electron mass < 110
100 < di-jet mass < 140
cos(theta) of the Z candidate < 0.9
117 < recoil mass < 150

Thursday, February 26, 2009

Cuts used

The events used for the previous post were required to have the following:

Number of electrons >= 2
84 < di-electron mass < 105 GeV
100 < di-jet mass < 130 GeV
cos(theta) thrust axis< 0.9
115 < recoil mass < 160 GeV
Number of reconstructed particles >= 45

The 'number of reconstructed particles' cut I took quite high; my signal/sqrt(signal+background) was quite low and I wanted to improve it. Around 17 can be used without losing a single partonic higgs event, maybe I should have used that and then tried relying on the likelihood cut to get rid of everything else.

Anyway, I'll try Hajrah's cuts and see what I get.

The plots I posted this morning have also been changed to have titles, and the data is split down into the constituents for clarity.

I've been playing around with the cuts, which has improved signal to background but the fit still isn't very good for the c-tag here.
The templates are here:
data
reference
I think it's just the statistical fluctuations mean the c-tag between the test and reference samples don't look particularly similar. The previous plot I had that Roberval commented on being particularly good was for the whole sample; with these new cuts the combined distribution looks similar (but not quite as good).

I've loosened the lower di-jet mass cut which seems to have allowed more b from (presumably) Z decays through, as can be seen by the higher b peak in the background than I was getting before. I thought this might mean the 3D fit (with the bc-tag added in too) might be better, because the background would be distinguishable from the ccbar and gg templates. The fit is below, marginally better but nothing to shout about.
here.

The b vs. bc templates are below. Ignore the axis labels, the c-likeness is actually the bc-likeness.
data
reference

LOI Documents

Just a reminder that the LOI documents for the analysis (draft notes etc.) can be found on the ILCILD wiki.

Wednesday, February 25, 2009

cut based selection( muons) update

Update until now is here.

Monday, February 23, 2009

Chi2 fit

Results:
Removed the bins with less than 10 events from the chi^2 function, included statistical fluctuation effects from the Monte Carlo samples, histograms with 10x10 bins, signal samples only.
r_bb = 0.981 +- 0.031
r_cc = 1.13 +- 0.28
r_gg = 1.07 +- 0.22
r_bkg = 1.0 (fixed)

Possibility to fit with chi2 function

I was studying the method to fit with finite MC samples as in Barlow & Beeston, Comp Phys Comm 77 (1993) 219, and I face the following problems:

Treatment of errors from the fit;
Bins with zero entries;
Method is valid only when the number of events in a bin of a template is much smaller than the total number of events in the histogram.

The first two seem not straightforward but feasible. The last one is the most difficult. That's a strong requirement if one wants to use the method. But in our case gg, background and bb templates do not fulfill the requirements, in particular the bb template that is essentially all events in one bin.

Facing this difficulties I started to think of the possibility to use the chi^2 fit. Our data sample can have bins with small number of events, but are those bins important for the fit? Ignoring the effects from statisitcal fluctuations, I checked how much the likelihood function changes if data bins with less than 10 events are removed (in both data and Monte Carlo, only bins with non-zero entries are considered). The likelihood function to be maximised (ommiting the constant factorials) is

where d_i is the number of events in bin i of the data sample and f_i is the number of events in bin i from the fit "function".
The plot below shows the likelihood function divided by the likelihood function for d_i >= 10 as a function of the fit parameters r_xx. The ratio is very stable in a range of the parameters of the size or larger than the "expected" errors for these. This means that the likelihood function considering only bins with 10 or more events is essentially the original likelihood times 1.011. The maxima and the errors will be the same.

We can use the chi^2 approximation if we neglect the bins with less than 10 events in the data sample.

What is not clear to me is the case where a bin from the MC samples has small number of events. If I remove the bins of the MC sample with less than 10 events, the all curves above are still flat but goes up to 1.014.

Sunday, February 15, 2009

Seoul Talk

corrected talk is here.

Thursday, February 12, 2009

Fitting problems

I tried the extended likelihood fit in RooFit but couldn't get it to minimise, although I didn't spend huge amounts of time on it. I also tried implementing the method myself but it gets unstable for large binning.
As a fall back I decided to get results with the chi2 fit but with the combined errors from the templates and data, but results aren't good for ILD_00. The templates look fairly good though; template is here and data is here. That's for a polarisation of 80-30 for electrons-1 positrons+1.
These are the results I get. I'm currently playing around to see if changing the polarisation or luminosity changes anything.

Seoul Talk

Talk for Seoul ILD meeting is here.

Efficiency Plots for Event Selection

Some plots to get efficiency are here.

Wednesday, February 11, 2009

Event selection with TMVA

I checked that the input variables are not affected by the initial polarisation. I also checked the correlations, and they remain the same. So one can use the samples with different polarisation with set of variables I am using for training with much to worry about.

I played a bit more with the parameters in TMVA and I finally was able to use the recoil mass as a discriminating variable replacing the energy of the dilepton (Z). I also added the remaining samples that I did not use last time for the training (now it is 2x more).

I got slightly better results, mostly from optimising parameters and using the recoil mass than from using more samples in the training. Again, the likelihood give the best result, compared with boosted decision trees and neural networks. Neural networks was also better than before by simply adjusting some parameters, namely the number of nodes in the hidden layers. Boosted decision trees method is sinister. I don't understand well that method, but it seems that one needs lots of events and lots of input variables as well. At least now, after adjusting the pruning of the trees, its output from the training and test samples are not so discrepant as before.

Previously with old likelihood (in number of events),

signal : 3327 -> 2681 (efficiency 80.6%)
background: 31132 -> 2722 (efficiency 8.7%)

Now with new likelihood + tuning (in number of events),

signal : 3327 -> 2725 (efficiency 81.9%)
background: 31132 -> 2386 (efficiency 7.7%)

I will soon try with Mark's variables to see if that improves.

Fit with finite MC - first attempt

My first attempt to fit with finite MC samples. Results here .
Still to check:

How to treat bins with zero entries.
Are errors calculated correctly?
How to fix one parameter.

Thursday, January 29, 2009

Fit with finite statistics

Joel found this paper, which I think describes exactly what we need. I'm going through it at the moment trying to figure out how to apply it.

There is a root implementation, TFractionFitter, but in one dimension. That could be used by just putting everything in one dimension e.g. instead of a 10x10 bin histogram have a 100 bin histogram. I also found this page which says there are problems with the implementation (although it's unclear for which root version). One of the comments says that there is an implementation in RooFit that works properly though, so I'll have a look at that.

Event selection with TMVA

Link here .

Leptonic event selection update

Today's talk is here.

Wednesday, January 28, 2009

Leptonic Event Selection

Last week's talk is here.

Thursday, January 22, 2009

Correct fits and proposal for more samples

I applied the corrections in the re-scaling of the errors from the templates in the chi^2 function. You can find the results here.

The results in the last column were obtained without taking into account the statistical errors from the templates. We can see that as the number of events increase the absolute value of the errors tend to the case when the errors from the templates are zero. The central values fluctuates a lot. An extra sample with L = 2000 fb^-1 (34000 events of mu-mu-h) would be the minimum (Desch-Kuhl used 5000 fb^-1 and split in data and Monte Carlo) . Presently there is 1000 fb^-1 reconstructed. We can use 500 fb^-1 (which is what one expects in reality) for the data and the other 500 fb^-1 with the extra 2000 fb^-1 that we request. I would prefer more, but we have to see that the time scale is short.

Besides that we should try to get an extra 1000 fb^-1 for likelihood or other method for signal-background separation.

What do you think?

Fit studies

A slide can be found here .

Likelihood fit for Poisson errors

Talk for today is here.

Sorry it doesn't look pretty. The error bar colours are particularly garish.

Thursday, January 15, 2009

Muon selection with multivariate analysis

I have the steps to do the muon selection with TMVA more or less documented in my wiki page .

Meeting Minutes and Plans

Today's Meeting Minutes

Attending: Roberval, Hajrah, Mark, Joel

Mark has been working on the final fitter and showed some plots from pseudo experiments. There are clearly still a few issues to be worked out.
Roberval and Hajrah have been making good progress on the muon channel. They now use the ROOT package TMVA to perform a multivariate muon selection. They have also been playing around with the fit, including fitting background+gg as a single template.

Preparations for the LOI were discussed. The timescale is short, although uncertain and needs to be checked. The plan is:

Roberval will re-visit the event selection, and a common one should be used for the two analyses
Mark will continue his fit studies, and the fits from the two channels will be done independently and combined
Joel will look at whether the Z-fusion eeH process is an issue.
It is uncertain whether combing background+gg templates is the best thing to do, so it will need to be studied

We will have extra meetings in the run up to the LOI, starting next week (at 11:30)

Status of the analysis

Talk can be downloaded from here .

Toy MC fit

I had a quick go at a toy MC fit, needs a bit more work though.

First I created two toy data sets from the Poisson distribution of some input histograms, and then tried fitting them together using the error on both to calculate the chi2. First results are here. The gluon fit seems to be a bit erratic for some reason.

I then tried fitting one of the toy sets to the input using only the error on one of them. The results are here. Something's clearly going wrong here, I'm looking in to it but probably won't have time before the meeting at 11. The results are so bad it's got to be something obvious though.

In both sets of plots the black line is the true value of the ratio.