Monday, February 23, 2009

Possibility to fit with chi2 function

I was studying the method to fit with finite MC samples as in Barlow & Beeston, Comp Phys Comm 77 (1993) 219, and I face the following problems:
  • Treatment of errors from the fit;
  • Bins with zero entries;
  • Method is valid only when the number of events in a bin of a template is much smaller than the total number of events in the histogram.
The first two seem not straightforward but feasible. The last one is the most difficult. That's a strong requirement if one wants to use the method. But in our case gg, background and bb templates do not fulfill the requirements, in particular the bb template that is essentially all events in one bin.

Facing this difficulties I started to think of the possibility to use the chi^2 fit. Our data sample can have bins with small number of events, but are those bins important for the fit? Ignoring the effects from statisitcal fluctuations, I checked how much the likelihood function changes if data bins with less than 10 events are removed (in both data and Monte Carlo, only bins with non-zero entries are considered). The likelihood function to be maximised (ommiting the constant factorials) is
where d_i is the number of events in bin i of the data sample and f_i is the number of events in bin i from the fit "function".
The plot below shows the likelihood function divided by the likelihood function for d_i >= 10 as a function of the fit parameters r_xx. The ratio is very stable in a range of the parameters of the size or larger than the "expected" errors for these. This means that the likelihood function considering only bins with 10 or more events is essentially the original likelihood times 1.011. The maxima and the errors will be the same.
We can use the chi^2 approximation if we neglect the bins with less than 10 events in the data sample.

What is not clear to me is the case where a bin from the MC samples has small number of events. If I remove the bins of the MC sample with less than 10 events, the all curves above are still flat but goes up to 1.014.

No comments: