Bristol-Edinburgh Z Higgs Analysis: March 2009

Friday, March 13, 2009

Electron sample with 250fb-1

Fit results for 10x10 bins:
r_bb=0.95 +/- 0.056 (5.93%)
r_cc=1.3 +/- 0.55 (41.82%)
r_gg=1.2 +/- 0.52 (44.51%)

Errors from the toy study using chi2 ignoring less than 7:
r_bb 0.057 +/- 1.7*10-3
r_cc 0.49 +/- 6.4*10-2
r_gg 0.52 +/- 4.9*10-2
The error on the error is taken from the sigma of a gaussian fit on a plot of the errors, with the error taken from the mean (if you see what a mean - no pun intended).

Errors from the toy study of a toy set to its source using the likelihood function:
r_bb 0.038 +/- 5.8*10-4
r_cc 0.36 +/- 1.4*10-2
r_gg 0.38 +/- 6.9*10-3

The numbers mostly match Roberval's well, although the likelihood r_cc and r_gg are a little lower here.

Plots, same as yesterday:
r_bb
r_cc
r_gg

Pulls:
chi2
likelihood
The chi2 is off slightly. I wasn't going to look into it too much unless anyone thinks it's a cause for concern?

Also, the fit results:
chi2

Thursday, March 12, 2009

Electron sample with 500fb-1

I'm running the 250fb-1 numbers at the moment, here are the results for 500fb-1.

The "Toy study using limited templates" is the chi2 fit ignoring less than 7 entries using 5000 tests. I plotted the error, fitted a gaussian to it and that gives the value and the error bars.
The "Toy study error using infinite statistic templates" is the likelihood fit of a toy sample to its source. Again 5000 tests, with a gaussian fitted to the errors.
The "Error from fully simulated data" is the error from the chi2 fit ignoring less than 7 entries.
r_bb
r_cc
r_gg

The fully simulated data matches the toy study for limited statistics and the values for 10x10 bins match Roberval's 500fb-1 numbers quite well. When I post the results for 250fb-1 I'll add the numbers at 10x10 bins too.

The pulls for both of the toy studies are below. A slight low bias for the chi2 but other than good.

Likelihood
Chi2

Results scaled to 250 fb^-1 - muon channel

By simply scaling the data histograms, the central values did not change and the errors are larger, as expected

Likelihood fit

r_bb = 1.013 ± 0.038
r_cc = 0.81 ± 0.46
r_gg = 0.98 ± 0.45

Chi2 fit

r_bb = 1.012 ± 0.044
r_cc = 0.87 ± 0.54
r_gg = 0.93 ± 0.51

The contribution from the MC to the errors in the parameters should be the same as in the 500fb^-1 case because on the MC side nothing changed. The pull distributions for pseudo-experiments with MC only give contributions of 46% in the estimated error arising from the MC finiteness, yielding exactly the same errors as before.

r_bb = 1.012 ± 0.039 (data) ± 0.020 (MC)
r_cc = 0.84 ± 0.48 (data) ± 0.25 (MC)
r_gg = 0.95 ± 0.45 (data) ± 0.23 (MC)

The statistical errors from the data in the chi2 fit are very similar to the ones from the likelihood.

It seems that with 250fb^1 of luminosity, even combining the different channels, the measurement of the branching ratio of the Higgs boson will be quite poor.

Actions from Meeting

It was agreed that the method was now in place, and the muon numbers look good for 500 fb^-1.

The actions required to complete our contribution to the LOI:

Re-run muon numbers for 250 fb^-1 (Roberval)
Re-make figure with flavour-likelihood distributions with axes labels swapped and all text enlarged (Hajrah).
Run final electron numbers for 250 and 500 fb^-1 (Mark).
Final edit and combination of numbers (Joel).

Results for muon channel - Update

Fit with Poisson Likelihood

r_bb = 1.013 ± 0.027
r_cc = 0.81 ± 0.33
r_gg = 0.98 ± 0.32

With the likelihood method contributions from a finite Monte Carlo sample are not considered.

Generating pseudo-experiments of the data assuming the data is poisson distributed yields pull distributions of gaussian shape with mean O(10^-2) and rms ~1.

Fit with Chi2

r_bb = 1.012 ± 0.033
r_cc = 0.84 ± 0.41
r_gg = 0.95 ± 0.39

In this case, statistical fluctuations arising from the finite MC sample are taken into account. The poisson to gaussian approximation is only valid if the number of events in the bin is larger than 5. We then considered in the fit only bins with at least 7 entries in the data.

To check that this method is valid and consistent with the likelihood method, pseudo-experiments of the data and the MC were generated and the pull distributions of the fit parameters were calculated considering that the "true" value of the parameters are the ones obtained in the likelihood.

The mean (rms) of the pull distributions are 1.0 (-0.08) , 0.99 (-0.08) and 1.0 (-0.16) for r_bb, r_cc and r_gg, respectively. Notice that r_gg is slightly biased.

In order to estimate the contribution of the finite MC sample to the error of the fit, the pull distributions of the fit parameters were obtained using pseudo-experiments generated for the MC only. The pull distributions are then gaussians with width less than one. That is the fraction of the fit error that is purely due to the MC statistical fluctuations. We obtained widths of 0.6 for the three parameters. Splitting the error contributions from the data and from the MC we obtained:

r_bb = 1.012 ± 0.027 (data) ± 0.020 (MC)
r_cc = 0.84 ± 0.34 (data) ± 0.25 (MC)
r_gg = 0.95 ± 0.31 (data) ± 0.23 (MC)

Ignoring small bins

The pulls from the toy study:
r_bb
r_cc
r_gg

Results from the full data:
r_bb
r_cc
r_gg

Tuesday, March 10, 2009

Likelihood cut variables

Some plots for likelihood variables I'm only just getting around to putting up:

Jet energy difference
recoil mass
Z candidate cos theta
thrust cos theta

Results for muon channel

Results from the fits:

The fit was done minimising a chi^2 function where bin containing less than 10 events were discarded. The fit procedure considered effects from statistical fluctuations of the MC samples used in the fit. The results are shown below: The first error is the statistical error from the data, and the second error is the contribution from MC which was estimated (about 60%) using toy MC.

r_bb = 1.021 ± 0.027 ± 0.020
r_cc = 0.89 ± 0.34 ± 0.25
r_gg = 0.91 ± 0.31 ± 0.23

More details here.

Event reconstruction and selection

Two muon candidates with opposite charges, identified using neural networks, momentum > 20 GeV and isolated (no track in a cone of 5o around the candidate) were combined into a Z boson candidate. If more than one Z candidate is found, the one with mass closest to the nominal (PDG) mass is considered.
After the di-lepton from Z decay is identified, the other remaining particles are force into two jets. The di-jet system is combined in a Higgs candidate.
Only events with at least 25 reconstructed particles are taken.
70 < M_ll < 110 GeV
117 < M_rec < 150 GeV
100 < M_jj < 140 GeV
| cos theta_ll | < 0.9

where M_ll is the mass of the di-lepton, M_jj is the mass of the di-jet, M_rec is the recoil mass and theta_ll is the polar angle of the di-lepton.

less than 10 data ignored

Same as before but with bins with less than 10 data ignored. Only 500 test this time though.

Fitting two separately poisson generated samples together:
r_bb
r_cc
r_gg

Fitting a poisson generated sample to its source:
r_bb
r_cc
r_gg

Sunday, March 8, 2009

Results for different bin ranges

The fits for 2x2 to 20x20 bins are done, pull means and sigmas are show below.

Fitting two separately poisson generated samples together:
r_bb
r_cc
r_gg

Fitting a poisson generated sample to its source:
r_bb
r_cc
r_gg

The chi2 works well for the first step, but then overestimates the error in step 2. That's not surprising since it's still using the errors from the templates, but if you don't (red line) the mean has an increasing bias as the binning increases.
The likelihood on the other hand under estimates the errors for step 1 but works well for step 2.

More fit studies

I had a look at the pulls for 5000 tests of fitting together two toy samples at 10x10 binning. Results are below, I'm also running it over 2-20 bins now but it's taking a while.

Note that I defined the pull differently to the normal convention so a positive mean means that the results are too low. I'll correct that in any further plots.

r_bb, different chi2 variants
r_bb, different likelihood variants
r_cc, different chi2 variants
r_cc, different likelihood variants
r_gg, different chi2 variants
r_gg, different likelihood variants

Barlow-Beeston has a weird double hump which I don't understand, again this is probably my implementation, or the Poisson approximation breaking down. I wouldn't have thought it's the approximation because it's not as bad for r_bb where the number of bins in the peak is not much less than the total entries.

The only chi2 fit without a bias is the one that uses the r_xx values in the error calculation, which "gives nonsense results". We could potentially do the fit recursively using a constant r_xx from the previous iteration, as Klaus suggested. Is there time to implement it though?

The likelihood gives reliable central values, but underestimates the error (the pull sigma is high). I thought about trying the equation Roberval posted last week, but the only difference is the d*Log[d]-d term which will just give a constant shift - hence the fit values and errors will be the same. I could modify the change for which Minuit calculates the errors (the "Up()" method) using (another) toy study, but without a mathematical reason for changing it is that justified?

Friday, March 6, 2009

Fitter comparison

Here's a comparison of the fitters that I've currently got coded up. The fitters that are used are:

"chi2" - chi2 using the template errors and the r_xx values in the error calculation
"chi2NoRValues" - same as above but without the r_xx values in the error as pointed out by Klaus
"chi2NoErrorFromTemplates" - only using the error from the data in the chi2 error
"likelihood" - simple likelihood fit, but not quite the same as Roberval was using from the Barlow-Beeston paper. It's [d*ln(f)-f - d*ln(d)+d] which I got from the NIM paper I mentioned a while ago (NIM 221 (1984) 437-442).
"BarlowBeeston" - The Barlow Beeston algorithm, as best as I've managed to implement it.

I've also tried it ignoring bins where the data is less than 10. I've not had a proper think about whether this is a fair thing to do yet, but the results seem good.

r_bb
r_bb ignoring bins where the data is less than 10

r_cc
r_cc ignoring bins where the data is less than 10

r_gg
r_gg ignoring bins where the data is less than 10

I also had a look at what they're like when cheating the background contribution, and it's pretty much a small constant shift towards the true value. It's not huge though so I haven't posted the plots - the efficiency of the current selection is quite reliable.

I'm not that keen on using the chi2 anymore because the results aren't very good without the r_xx values in the errors. Barlow-Beeston is still giving some strange results, not that visible here but the pulls on the toy study look strange (I'll post them once I've done a bit more work on that). I think it's more to do with my (currently) buggy implementation more than the method. As Roberval pointed out though, there is the approximation of Binomial errors to Poisson which is only valid when the entries in a bin are much less than the total entries.
The likelihood looks okay, so Im edging towards that. All of the r_bb results have a bias though, which I don't understand.

Everything is improved when cutting out bins with less than 10 data, so I guess a final decission depends on if we agree that is a fair thing to do. I'll also try each method with the toy study - multiple test might shed a bit more light.

Monday, March 2, 2009

N-1 Plots

Some N-1 plots from the electron sample when using the cuts Hajrah suggested on Thursday. The signal over sqrt(signal+background) is overlaid, with the "star" series showing the s/sqrt(s+b) if you take everything to the right of it and the "dot" everything to the left. Both s/sqrt(s+b) use the right axis while the histograms use the left. It should make sense if you look at the plots.

Di-electron mass
Di-jet mass
Recoil mass
cos(theta) of the Z

I didn't bother with a plot for "number of electrons" because it doesn't show anything.

It looks like the lower di-electron mass could go up a bit, everything else looks spot on. I'm not suggesting we should necessarily change it, keeping the muon and electron channel cuts is probably more important for now and the likelihood cut should take up the slack. As Roberval mentioned though, the 2 fermion background might require us to tighten the di-lepton mass cut.

For reference here are the cuts:
number of electrons >= 2
70 < di-electron mass < 110
100 < di-jet mass < 140
cos(theta) of the Z candidate < 0.9
117 < recoil mass < 150

Bristol-Edinburgh Z Higgs Analysis