Bristol-Edinburgh Z Higgs Analysis: 2008

Friday, November 28, 2008

Neural networks

I've been trying to understand the differences that Mark found between the results using SGV-Tesla and Mokka-LDCPrime_02Sc neural networks. I am pretty confident that the new neural networks is better than the old ones. Not only because other plots, such as the purity-efficiency, say they present better performance, but becasue the old ones are not correct as some overtraining can be spotted.
I wrote my conclusions on neural networks down in the web page here.

Thursday, November 27, 2008

Purity x Efficiency (Z-Higgs)

Here is the flavour tag purity-efficiency plot with Z-Higgs events comparing Mokka-LDCPrime_02Sc neural networks (open symbol) and SGV-Tesla neural networks (solid symbol).
The c-tag performance is similar, but slightly better for Mokka-LDCPrime_02Sc, for mid to high efficiencies. Mokka-LDCPrime_02Sc is much better for low efficiencies.

c-likeness

Get plots here .

Flavour Tagging results (so far)

I've created a new sample for LDCPrime_02Sc_p02 from the generator files at the ILC database (50,000 ZH->eeXX and about 62,000 ZZ->eeqq). I was looking at the flavour tag results using the LDCPrime_02Sc nets (note not "_p02") Roberval sent me, as well as his tuned JProb parameters and RPCutProcessor parameters. I didn't get very good results for that:

New parameters and new nets

To see if it was something to do with the new samples I then tried the old tagging:

Old parameters and old (SGV trained) nets

That was much improved, so to see if it was the new nets or parameters that were causing the problem, I tried redoing the new tag with the old nets (because it was the easiest thing to do):

New parameters and old nets

That was much better than both (although slightly smaller peak for the b sample). It looked like the b tag was okay using the new nets, so I wondered if it was any good using the new b-tag nets and the old c-tag nets, to see if it was actually just the c-tag that's a bit off:

New parameters and new b nets, old c nets

Although much better than everything done with the new nets, the b tag is still not as good as with the old nets.

Wednesday, November 26, 2008

Branching Ratios

We are aimed to investigate the BR of Higgs decaying into bb, cc and gg for SM Higgs. (120GeV)
In our case all other decays are the background.
We separated all flavors (separation of Muons is done at first place) and make plots of BTag, CTag and BCTag, three plots at the bottom (Red for bb events, Blue for cc events and Black for light quark events).

Using the method in Kuhl-Desch paper, we split our 20,oo0 events into Data and Sample events in 1:1.
As an exercise, we tried to find the parameters for bb/gg and cc events by setting value of parameter for gg/bb and background equals to one. The likelihood function is then plotted as shown below.(Two plots on top, first one is for cc vs gg and second one is gg vs cc)

In order to get the fitted value of the parameters, we will minimize the likelihood function. And then using the fitted value of the parameters we will find the branching ratios.

Thursday, November 6, 2008

Software issues

I've come across two software issues recently, both of which are fairly old but here's a bit more information:

Mokka crashing when reading stdhep files - I've managed to fix this by changing a few things, but I'm now having trouble breaking it again to see exactly which one of those things fixed it. Mokka uses the stdhep reader from LCIO, but I'm not entirely sure if that uses CLHEP or not (I don't think it does). I used the HEAD version of CLHEP, Mokka and LCIO; and the most recent tag of Geant4. I also compiled LCIO with CLHEP, just in case the reader does use CLHEP. After all of that, Mokka can read stdhep.
LCIO can't read files with Pandora clusters and CalorimeterHits in them - I came across this ages ago, but thought it was because the file size was too large. If you do try and read one of these files, you get the error
"*** glibc detected *** free(): invalid pointer: 0x0000000000b629e0 ***"
After reading in the first event I think LCIO is trying to delete the CalorimeterHits twice, once for the normal collection and once for the hits associated to the Pandora Clusters. I don't know if this is a problem with LCIO or with Pandora, I'll put a post on the forums.

Update 12:15
When going to post about the second point, I saw there was a post in the LCIO forum "stdehepjob fails on 64 bit". Basically, Roman Poeschl tracked down exactly where the error occurs and Jan Engels fixed it in the LCIO cvs on the 29th of Septemtber.

So to fix Mokka crashing when reading stdhep on a 64bit machine, you need the HEAD version of LCIO or a tag made after the 29th of September.

Thursday, October 23, 2008

Muon indentification with TMVA

As I said in the meeting this morning, I am learning how to use the TMVA (Toolkit for Multivariate Data Analysis with ROOT) package to improve the signal-background separation.

My learning process started using the single-particle samples that Hajrah used to obtain selection cuts for the muon identification. I used the same distributions as the input discriminating variables into four different methods: CutsGA (rectangular cuts), Likelihood, MLP (neural networks) and BDT (boosted decision trees).

All methods can give about 99.6% efficiency at the maximum S/√(S+B), i.e. at the optimal background rejection. Previous cuts give 97.5% efficiency with similar purity.

Some nice output plots from the package can be seen in my wiki page .

Thursday, October 2, 2008

Likelihood cut

Current status of the likelihood cut:

plot

Z-Higgs samples

Some numbers and comments for the discussion of the samples we need can be found here.

Thursday, September 25, 2008

pandora-pythia FSR

FYI
I just found out that pandora-pythia does not consider QED final-state radiation, i.e., final state electrons and muons do not emit photons.

Thursday, September 18, 2008

Background samples

According to the Kuhl and Desch paper the following processes contribute to the background of ZH:

e+e- → Z/γ → qq(γ)
e+e- → WW → qqqq
e+e- → WW → qqlν
e+e- → ZZ → qqqq
e+e- → ZZ → qql+l-
e+e- → ZZ → qqνν
e+e- → Weν → qqeν
e+e- → Ze+e- → qqe+e-

But not only some of them will contribute to the leptonic channel as one can see on page 12. After the channel classification, i.e. 2 electrons or 2 muons with momentum > 15 GeV, we need essentially the processes:

e+e- → ZZ → qql+l-
e+e- → Ze+e- → qqe+e-
e+e- → WW → qqlν

The list is ordered according with the contribution in the Kuhl-Desch paper. The process number 3 contributes less but has the largest cross section. If we need that sample (the cuts essentially remove this contribution) we should generate for the nominal luminosity only to save disk space and compare with the scaled signal and the Z background.

Concerning sample 2, I was wondering why the process e+e- → Zμ+μ- → qqμ+μ- was not considered.

For samples 1 and 2, as well as for the signal, we may need samples generated with luminosity of about 4 ab-1.

Another thing in Kuhl-Desch paper that was not clear for me if the meaning of the symbol q. On page 12, table 7 the column with qq says 2 fermions. Would these qq pairs be tau pais for example?

Wednesday, August 20, 2008

Neutrino effect on dijet mass

I still did not cross check the effect of neutrinos in the dijet mass yet. In confirming so, we can take advantage of that to separate the ZH events from the ZZ background. The ZH events for dijet mass values below about 110 GeV should have more missing energy than the ZZ events with dijet mass above about 92 GeV. This range, 92 < dijet mass < 110, contains most of the overlapping between the two processes. Then instead of cutting on the dijet mass at 96 GeV we cut on the missing ET-dijet mass plane. By just applying the cut, obtained "by eye", shown in the plot 10% of the signal was recovered whereas 10% more background events were cut when compared with the case where just a dijet mass cut is applied (see previous post).

If time allows I will think of a smart way to perform the background separation using the information from the plot below. But the idea is there.

Monday, August 18, 2008

Neutrino effect on the di-jet mass

Cheated assignment with and without Monte Carlo neutrinos
This plot shows the reconstructed di-jet mass, but using the reconstructed particles chosen from Monte Carlo. There's also the same thing but with the Monte Carlo neutrinos taken into account, which shows the low Higgs tails are clearly due to neutrinos.

Friday, August 15, 2008

Minutes of Meeting 14/8/08

Present: Roberval, Clare, Mark, Joel

Roberval showed plots of his studies on the 230 GeV sample of Z->µµ+H, some of which have been discussed before. There was discussion of a dijet mass cut vs the recoil mass cut, but the dijet mass from the Higgs shows a long tail at lower values. Using a dijet mass plus Z energy cut does give very good background rejection for a signal efficiency of about 1/3.

Mark's plots from Monday for the 250 GeV Z->ee samples are similar, but have been showing a high tail in the dijet mass spectrum (particularly in the ZZ).

After some further investigation and discussion in the meeting, it was decided that:
- The long low-mass tail in the Higgs dijet mass plots is probably neutrinos. Mark and Roberval should check this by adding
  in neutrinos from the MC
- The high-mass tail in the electron samples is mainly due to photons (primarily FSR and Brehmsstrahlung) being included in the jets. Mark is looking at using a smaller Ycut to isolate photons (and taus) and then veto any jet with less than two tracks, as shown in his latest plots. He has also looked at picking out photons close to primary electrons.

There was some confusion about whether we are expected to give a status report in the ILD meeting next week [it's actually the week after]. Since Victoria gave the last talk, it was decided that someone from Bristol (Mark) should give the next one.

Questions about beamstrahlung and ISR

I've been reading about initial state radiation (ISR) and beamstrahlung and I am not sure whether one can measure the photon or not. Some say that it is possible for ISR, but I understand that in most of the events the photon(s) would be outside the detector acceptance. Besides that the calculations of ISR that I know are done considering the electrons as having some sort of structure.

For beamstrahlung it is said it could be possible to measure its effects, using for example the lumi detector, considering the effect as a collective effect of the particles of the beam. But the nature of the effect is statistical. What if the particle participating in the hard scattering were affected? Moreover, no photon should be possible to be measured because the radiation is emitted as the beam moves along the tunnel.

Does anybody understand these issues better? I believe that understanding this things better we could try to do something to improve the measurements in a long term.

Thursday, August 14, 2008

Plots for meeting

Yet again I've left it a bit late to describe these here, I'll talk about them in the meeting and then write something here later.

Mass plot for ycut jet finding
We had a go at selecting the hadronic stuff by using ycut jet finding with a low ycut and requiring the jets to have at least 2 tracks. This is the mass plot for the best peak I get.

Comparison
This is that plot overlaid on the perfect particle assignment from Monte Carlo, and the plot from removing photons close to the electrons. The ycut method has a slightly smaller peak than the brem removal method, but smaller high tail which is the important thing.

Status of the analysis

A presentation with the status of what I am doing for the analysis can be found here.

Monday, August 11, 2008

Dijet mass tail: Pandora x Pythia

I got a sample generated with pythia from the grid to compare with my Pandora-Pythia samples. Both shows long tails towards negative dijet mass values. I used 2000 events from each sample.

Improving the mass plots

When checking the performance of the precuts we found that the jet mass cut was cutting very little of the ZZ background out; there should be similar numbers of ZH and ZZ going into the likelihood cut but we had about 4 times as much ZZ. The mass plot (here, for cheated electron identification) is quite wide and has a long high tail on the ZZ.
We had a look at several things, like trying ycut jet finding in case it was initial state radiation photons being lumped in as well when forcing to 2 jets. Nothing conclusive came up so I looked directly at a few events with the di-jet mass over 200 GeV. They all seemed to have highly energetic photons from the Z (the one that goes to electrons) or from the electrons themselves lumped into the jets. I had a go at cutting out any photons (as identified by Pandora) that are within 4 degrees of the primary electrons and came up with this. Here's a plot of the energy and number of these photons.
After checking that I thought I'd try and find out if there was anything else being thrown in that shouldn't. From the Monte Carlo to reconstructed particle relations I listed all the reconstructed particles that come from each Z (for ZZ; or Z and H for ZH) separately and plotted the invariant mass for these. That's basically the best you could ever get without improving the particle flow/tracking/etcetera. Here's a plot of the 3 methods over laid for the ZZ sample - original cheated electron ID; bremsstrahlung removed and completely cheated jet association. Removing the bremsstrahlung photons does a pretty good job, but there's still something going into the jets that shouldn't. Here's some details about the particles that are left over when assigning to the Z or the H (or the Zs for the ZZ sample) from Monte Carlo.

Next steps:

Find out what these left over particles are.
See how the mass plots look using realistic electron finding and removing the bremsstrahlung.
See how many ZZ and ZH get through the precuts now.

Friday, August 1, 2008

Minutes of Meeting 28/7/08

Present: Hajrah, Robervaal, Victoria, Clare, Mark, Joel (i.e. everyone)

Clare and Mark have been having another look at electron ID. When Mark looked a few months ago, the default efficiencies looked rather poor due to close brehmsstrahlungs and separated parts of the EM shower so he has been using MC information to cheat since. Now things look much better: using either Pandora defaults or a Kuhl-style simple ID (one cluster matched to one track, with cuts on track isolation and EM vs Hadronic energy) efficiencies above 90% are reached. Further tuning will be done by Bristol MSci students.
Robervaal has been looking at samples of ZH with Z->µµ and H->anything at √s = 230 and 250 GeV (20,000 events in each). He has identified cutting on the Z recoil mass as an extremely promising way of reducing the ZZ background.
It seems clear that the best way to perform both electron and muon analyses is to identify the leptons and then remove them before jet clustering.
Victoria has been looking at the B and C-tagging performance in jets. The C-tag does not look very good and probably needs re-tuning. The B/C tag is not relevant for this analysis.
Hajrah checked the electron ID cuts in the new detector model. The EM/total and E/p cuts need to be changed. This was done in single particle simulations, so she will next check in physics event.
Mark had some technical problems getting all of his plots on the web and will put them up after the meeting. The basic summary was that the flavour tag likeliness plots look good and we are probably ready to fit the templates, the last step in the analysis chain.

It was decided that we have enough to show in Wednesday's ILD meeting, and that Victoria will put together a talk.

ycut versus njet

I had a quick look into the ycut versus njet. You can see plots and conclusions here. It seems fine to use the njet mode instead of ycut to reconstruct the higgs from jets, if necessary.

What I am not sure is whether it is fine to use the 2-jet mode for vertexing and flavour tagging. At least the ZVKIN algorithm uses the jet axis as a starting direction for the ghost track. How the algorithm behaves when the starting direction is far away needs to be investigated.

Thursday, July 31, 2008

Dijet mass

Below is the dijet mass distribution. No cuts on the energy of the Z was done, and the reconstructed Z from muons have mass between 80 GeV and 102 GeV.

The jets were obtaining by reclustering after muon removal from the list of particles and forcing into 2 jets. I just don't understand why such long tails for low mass values. Still investigating. Improving the dijet mass resolution (by improving jet reconstruction) could save some events from the dijet mass cut.

Event selection tuning and Luminosity

The plots I've been showing must be looked at carefully. The selection of events I am doing up to now are "non-standard" and not properly tuned. I still have to check the standard cuts for ZH event selection, together with my cuts, in order maximise the statistics.

One also have to have in mind that the number of events in the samples corresponds to 6.6 the nominal luminosity (L), where L=500fb-1. So, divide the Y axis by 6.6 to get the "correct" number of events. A fine tuning of the selection is very important to get enough statistics of H into c cbar.

Removing further ZZ background

As I showed before the cut on the energy of the Z can eliminate a fraction of the ZZ background (see plot in previous post below). I am trying now other cuts that could improve further the removal of ZZ background.

I am applying the same cuts up to now:

M_Z between 88 GeV and 94 GeV
E_Z between 100 GeV and 103 GeV

The plot below shows the dijet mass (jets were reclustered into 2 jets after the muons coming from a Z boson are removed). One can see that the signal and the background are quite distinct.

Applying a cut on the dijet mass, Mjj .gt. 96 GeV, the ZZ background was further reduced.

Wednesday, July 30, 2008

Talk at ILD Optimisation Meeting!

Thanks for everyone's help! Here's a link to the talk.

ZZ versus ZH kinematics

I generated+simulated+reconstructed some ZZ samples with centre of mass energy = 230GeV. The files are in DST format. Soon I will upload them to the grid.

From kinematics the energy of the Z bosons in ZZ process in the centre of mass frame is sqrt(s)/2. But because of the different masses of the Z and the Higgs bosons, the energy of the Z boson should be (s-mZ^2-mH^2)/(2*sqrt(s)). We can see that from the plot of the energy of the Z boson below. The blue line was obtained using ZZ+ZH samples whereas the dashed black is from ZH only. Cuts on the energy of the Z may be worth doing to remove also the ZZ background.

The next plot shows the recoil mass. The different cuts are in the plot (problems using math symbols here).

Monday, July 28, 2008

Z-Higgs analysis

Link to the talk here.

A few question in my head We have four jets: we need to identify two as muons. How do we use the muon ID cuts on the jets, as opposed to on just on RecoParticle? Then how to use the NN cuts to ID the B and C jets?

Some plots

What I've been doing is ID-ing the electrons and then forcing everything else to 2 jets. This means I'd have to do my own jet finding and flavour tagging if using mass reconstruction files in the future.

Variables used in the likelihood cut
The thrust variables don't seem to do much, and the jet energy difference doesn't offer great discrimination. We still need to add the di-jet mass after the 5 constraint fit to this.

Flavour tag likeness for the Higgs sample (using the c-tag)
These plots look fairly good; there is a definite difference between each different plot that should be extractable from the branching ration fit.

Flavour tag likeness for the ZZ sample (both c-tag and bc-tag)
Same here for the ZZ. I've also added the same plot but using the c-with-only-b-background tag, although this doesn't show much (see the next set of plots). There also doesn't seem to be much tagged as c in the c-tag version. I'll have to look into that.

Flavour tag likeness for the Higgs sample (using the bc-tag)
The bc-tag just seems to tag anything that's not a b as a c (not really very surprising since that's what it's trained to do). All the templates (except the b) look pretty much the same, so not very good for fitting to.

Electron ID comparison
The next four plots compare the different electron identification methods. Everything is using a Z Higgs sample with the Z going to two electrons and the Higgs doing whatever it fancies, so there is some neutrino stuff in there as well from e.g. Higgs->WW. There's a total of 43888 events split according to standard model 120GeV Higgs branching ratios.
Note that anything labelled as "Pandora" is using a head version of PandoraPFA from a couple of weeks ago. A new tag has recently been released which claims to have improved electron ID, I'll try and redo the plots with the new version soon.

Di-jet mass of everything not identified as an electron forced to 2 jets
There are no cuts on any of these plots, so some of the high tails should be reduced if I cut out anything where less than 2 electrons are found (see below).

Di-electron mass of everything identified as an electron
I've called this "di-electron" mass but quite often there are more than 2 electrons involved, see the next plot for the numbers. Note that I forgot to put a cut on the number found, so the spikes at zero are when 1 or no electrons are identified.

Numbers of electrons found
The cheated electron code takes the Monte Carlo Z daughters and uses the LCRelation collection to match them to reconstructed particles. If more than one reconstructed particle is related to a particular electron then they are all lumped into one "jet" and the combined four momentum used for the cheated electron. As such I have no idea how it can find more than 2 (let alone 6?). I'll have to look into it when I get time. Addendum 06/Aug/08 - The Higgs sample has Higgs to ZZ, hence the possibility of 4 or 6 electrons from Zs. Doh!

Recoil mass using the electrons found
For this the initial four momentum was assumed to be a consistent (250,0,0,0); so you can clearly see the beam effects screwing up the plot. Have a look at one of my earlier posts to see how I simulated beamstrahlung; basically it's just a rough (over)estimate using 500GeV beam parameters.

Tag plots for B-jets

Tagging plots for true C jets

Sorry about the sideways!
Questions: why is the C-tag bad? Do we use the BC tag for these events?

Tagging plots for the ZHiggs sample

These plots are found using the Durham_4Jets collection. So two of the jets represent the muons.

Electron cut efficiency

The change due to new detector model is seen in the ratio of Ecal energy and the total energy, due to which we have changed our electron cuts. Now it is
1) Ecal/ETot = 0.9
2) ETot/p = 0.7

So with the new detector model, above cuts and more statistics, the efficiency for momentum and cos theta is here

Monday, July 21, 2008

Useful LCIO commands

Some time ago, Mark had problems with a large lcio file. Now I have the same problem. I found on the web the command

lcio split -i file.slcio -n 1000

that splits the file in chunks of 1000 events.

For more LCIO comands, see here.

Monday, July 14, 2008

z-hisggs preliminary results (2)

Efficiencies:

Ecm = 230 GeV: 17958 Z boson candidates reconstructed out of 20000 generated. (90%)
Ecm = 250 GeV: 18191 Z boson candidates reconstructed out of 20000 generated. (91%)

z-higgs preliminary results

Here are some plots produced using the muon selection from Hajrah to tag the muons in the PandoraPFOs collection.
Additional cuts: momentum of the lepton from Z greater than 20 GeV; mass of the Z boson between 70 and 110 GeV.
Recoil mass reconstructed using the nominal centre of mass energy.

The black and the red lines on the plots correspond to Ecm = 230 GeV and Ecm = 250 GeV, respectively. The samples used are described here.

Efficiency Plots with more statistics

After applying the Muon cuts, the efficiency plots for Momentum and Costheta are shown below.
Please follow the link here:
https://www.wiki.ed.ac.uk/display/ILC/Hajrah+Tabassam

Edinburgh z-higgs samples

Edinburgh Z-higgs samples, e+e- -> Z h
Z -> mu+ mu-
h -> anything

The samples are located on the GRID at
/grid/ilc/users/edinburgh/data/z-higgs

Up to now there are 2 samples,
- Ecm = 230 GeV,
- Ecm = 250 GeV,
with 20000 events each. Both reconstructed (still being uploaded) and DST files are available.

More details can be found here.

Friday, July 11, 2008

Sample grid locations

I've copied most of my samples to the grid, and started a script running to copy the rest. Everything is under the directory:

/grid/ilc/users/phmag/LCIOOutput/ZHiggsAnalysis

With currently three directories:

WW-anything
ZH-eeXX
ZZ-eeJetJet

The files should be self descriptive from the filename. I was going to write how many events etcetera there are here but I've rather stupidly left my log book at home, and I'm away from tonight. There are only 10000 WW events (in 2 files), other than that everything should be whatever the numbers are for 500 fb^-1. Higgs to bbbar is split over 4 files, and the ZZ stuff is split over 100 files.

The detector is LDC01_05Sc at 250GeV, but with SEcal03 instead of SEcal02 (there was a problem with SEcal02, I can't remember the details). Beam effects were simulated by forcing the centre of mass energy in Pythia to match the energy spectrum from PandoraPythia (with the 500GeV beam parameters). That's about all I can remember for now, I'll check my log book when I get back.

Monday, July 7, 2008

Installing OpenScientist

OpenScientist is a full C++ implementation of AIDA. I was having big problems using Java on 64 bit machines reliably so switched to this instead of AIDAJNI.
To install, first download from openscientist.lal.in2p3.fr. I went for the "batch" version, which is basically the simple version without any GUI stuff for batch processing.
The installation instructions weren't particularly to the point so I'll recap what I did briefly (obviously change the version number if appropriate):

unzip osc_batch_source_v16r4.zip
cd OpenScientist/v16r4/osc_batch/v16r4/obuild
source setup.sh
./sh/build
./sh/build_release

I think that builds everything (there is more to OpenScientist than just an AIDA implementation), but it's easier than figuring out which bits you actually need.

To use it with the lcio software you need a CMake module. I wrote one with all the library and include directories hard coded into it; it would be nice if I wrote a proper self-detecting one but I haven't had the time. You can find out the compiler flags for AIDA with the utility "<location>/OpenScientist/v16r4/osc_batch/v16r4/bin_obuild/osc_batch/v16r4/bin/aida-config".

My module is here, obviously you'll need to change the directories to match your system. I called the module FindBatchLab.cmake, so then all you have to do is put that module in with the other CMake modules and change the dependencies in your projects CMakeLists.txt file from "AIDAJNI" to "BatchLab".

IMPORTANT NOTE FOR JAS USERS - BatchLab puts empty histograms/clouds etcetera into the AIDA file, which JAS doesn't like (e.g. if you create a histogram and didn't fill it with anything). I modified the file "OpenScientist/v16r4/BatchLab/v1r2p1/source/XML/XML_DataWriter.cxx" to this. Copy my file over the original before building if you want to use JAS to analyse your AIDA files. The xml schema says there should be at least one entry for clouds/histograms/etc, so I think the JAS way is the correct way.

Tuesday, July 1, 2008

Minutes of Meeting 30/6/08

Present: Hajrah, Roberval, Victoria, Mark, Joel (Clare is still in a muddy field)

Mark's samples for 500 fb^-1 of ZH at 250 GeV (m_h = 120 GeV, Z->ee) have finished running through simulation and he is checking them. He used LDC01_05sc. The samples will be made accessible on the Grid and a post made to this blog. Z->µµ samples can also be made.
Clare has been working on getting a working suite of software on the Bristol 64-bit machine (Roberval has successfully compiled the new version of MARLIN on the Edinburgh machine in 32-bit mode)
Roberval has been writing a processor to use the muon ID cuts to tag RPOs
Hajrah is looking at electron and muon cut ID efficiency as a function of θ and momentum, but needs bigger samples
Victoria only has a few more weeks before her leave commences
Mark has been encountering problems reading in large (~10 GByte) LCIO files. No-one knows of any physical limit, but he will split the files up in future.

Thursday, May 15, 2008

Muon and Electron id

https://www.wiki.ed.ac.uk/display/ILC/Hajrah+Tabassam

Monday, April 21, 2008

Muons from Z (erratum)

In my sample there was 4000 events, so there should be 8000 muons. But in the previous plot there was 8270 muons. Checking my codes and the sample I found out that the "extra muons" were daughters from Z coming from the Higgs descendant line, e.g., H -> ZZ*. The correct plot is the below.

Muons from Z

I generated a sample of Z-Higgs events, at 250 GeV. The plot shows the distribution of the momentum of the muons coming from the Z.

I'll write a description about this after the meeting

Plot

Tuesday, April 8, 2008

Z Higgs talk at ILD meeting

There was an interesting talk on the Z Higgs channel at last weeks ILD detector optimisation meeting
z higgs talk

Wednesday, March 12, 2008

JAIDA & AIDAJNI on 64-bit

I wrote some documentation on how I installed JAIDA (here) and AIDAJNI (here) in our cluster of 64-bit machines.
Another thing is: do not forget to include AIDAJNI on the list in the CMakeLists.txt:
SET( ${PROJECT_NAME}_DEPENDS "Marlin LCIO GEAR AIDAJNI ROOT" )
If you did not do that, you must go to the build dir, make clean, delete the file CMakeCache.txt, run cmake, and do make install.

Note: Changing only in the ilcinstall python script, assuming you are using ilcinstall, to link to AIDAJNI does not work.

Monday, March 10, 2008

Plots for the meeting

I've been having a look at event selection; the Thorsten-Kuhl paper starts off with a cut based selection and then uses a likelihood selection. For the cut based selection they used:

Two leptons found (electron identification is performed before this). At the moment I'm cheating the electron identification from Monte Carlo because the electron ID was getting messy due to the extra reconstructed neutrals.
The invariant mass of the two leptons has to be between 70 GeV and 110 GeV.
Cos(theta) of the event's primary thrust axis less than 0.9.
The invariant mass of the two hadronic jets has to be greater than 100 GeV.

This plot shows these variables for the ZZ sample and the Higgs to c cbar sample. There's about twice as many events for the ZZ sample. The peak at zero for primary lepton mass is because I haven't cut out the cases where one electron is found; and the low energy rise in the ZZ sample is because it's a Z/gamma* sample. This plot is the number of events that pass these cuts, with 0 being the number of initial events and 1 to 4 the amount after the respectively numbered cuts above.

For the likelihood selection I've been looking at the following inputs:

Cos(theta) of the event's primary thrust axis.
The event's thrust.
The invariant mass of the two hadronic jets.
The energy difference of the two hadronic jets.

The Thorsten-Kuhl paper also uses the invariant mass of the two jets after applying a kinematic fit, but I haven't coded that up yet. This plot and this plot show these four distributions normalised to 1. For the likelihood I used these plots as reference distributions, and used the height for a given measurement to get a "probability" of getting that measurement for signal or background (since they're normalised to one). I looked at using just the signal references, basically just multiplying these "probabilities" together, and also by using a bastardisation of Bayes' theorem to include the background. For that I did the same as before but divided by the sum of the background and signal "probabilities". To be mathematically sound I should probably sum over all possible backgrounds, but then I'm a physicist not a mathematician.
This plot shows the efficiency purity for the two methods (efficiency on x-axis). Note that because I've not got particularly large samples this plot was made with the same events as the reference distributions. A bit shoddy but if I can't get it to work like that then I may as well give up now. There are 6375 ZZ events in the background, and 5864 signal (slightly more c cbar than b bbar).

Tuesday, February 26, 2008

LCFIAIDAPlotProcessor crash

A crash will happen running Marlin with the LCFIAIDAPlotProcessor for the LCFIVertex software release v00-02-02. That happens because the variable ptmasscorrection should have been changed to ptcorrectedmass. I corrected that in the HEAD version, but if you don't want to apply any other modifications you can download the LCFIAIDAPlotProcesso.cc here and recompile LCFIVertex.

Convert stdhep into HEPEvt

Here are instructions and codes to convert an .stdhep file into a .HEPEvt. Two files are needed:

The first file is the FORTRAN code used for the conversion. It outputs a file called output.HEPEvt. The second file must contain the name of your input stdhep file. To compile the code you need the stdhep include directory and the libraries libstdhep.a and libFmcfio.a. You can find them as part of the old cernlib versions, like 2002. You then compile with the commands (change the location according to your setup):

g77 -I$CERN/2002/include/stdhep -fno-second-underscore -c std2evt.F
g77 -fno-second-underscore -o std2evt std2evt.o $CERN/2002/lib/libstdhep.a $CERN/2002/lib/libFmcfio.a -lnsl

I also made a script with those commands. After compiling, an executable calles std2evt is created. Modify the file stdhep_file_name with the name of your stdhep file and run:

./std2evt

Some output may be redirect to the screen but don't worry. The converted output file in called output.HEPEvt.

Monday, February 25, 2008

Just for fun.

A friend sent me this link, and us geeks may appreciate it: http://stephenhicks.org/images/UniverseScale.gif

Minutes of meeting on 25th February

Present: Joel, Clare, Ryan, Hajrah, Roberval, Victoria.

Hajrah presented her plots (linked from the blog) showing the difference at the digitisation level between 55 GeV muons and 55 GeV pions. The energy of the muons probably doesn't make any sense. It uses the Muon digitiser from the head version of Marlin. Joel commented that it would be good to look at E_EM+E_HAD. Hajrah has not looked yet at tracking and clustering information, for this it would be best to use PandoraPFA, which is currently not working. Clare said that, after contact with Mark Thompson, she is using the head of PandoraPFA, which seems to work. Roberval will try to install that in Edinburgh, for Hajrah to use.

Clare had been installing the ILC software on a local 64bit Bristol machine. (The Grid is being abandoned!) Roberval has already done this in Edinburgh and Roberval and Clare will communicate as how to do this. Roberval noted that Marlin/Mokka run about 3 times faster on the 64bit machine in Edinburgh. We will encourage Roberval and Clare to co-author an ILC note on this subject.

Monday, February 18, 2008

New analysis timeline

With the funding cuts in the UK and USA (sob) there is a new time line for the physics studies. Instead of August, things are delayed by ~6 months, so I think the idea is to have a studey ready for publication by the end of 2008, beginning of 2009.

http://www.linearcollider.org/cms/?pid=1000498

Saturday, February 16, 2008

ECal bugs in LDC01_05Sc and LDCPrime_01Sc

Just a note for those not on the ild-detector-optimisation mailing list.

There's some kind of problem with the electronic calorimeter end cap hits in the Mokka models LDC01_05Sc and LDCPrime_01Sc. I can't say I understand it, I think it's something to do with the cells not scaling properly. Anyway, there's a different model being worked on now and it was decided not to bring out an interim fixed model. Paulo sent around instructions for fixing it yourself though:

Dear friends,

I guess that we decided to not touch the LDC01_05Sc and LDCPrime_01Sc models and to not create intermediate models. As the final model risks to take a while ;-) , here you are how to use LDC01_05Sc or LDCPrime_01Sc with the new Ecal (without the known bugs you have with the old one):

1) you have to checkout Mokka tag mokka-06-06-pre01 directly from our CVS HEAD (please follow the instructions here:
http://polzope.in2p3.fr/MOKKA/download)
2) , just add these two lines in your steering file:

/Mokka/init/EditGeometry/rmSubDetector SEcal02
/Mokka/init/EditGeometry/addSubDetector SEcal03 90

Cheers,
Paulo.

Thursday, February 14, 2008

Minutes of Meeting 11/2/08

Bristol: Clare, Ryan, Helen, Mark, Joel

Edinburgh: Hajrah, Robervaal, Victoria

Mark showed some plots that will form part of his talk in the general physics meeting tomorrow (see previous post). They show the y-cut that is required to ensure that all of the electron EM clusters are grouped together into a single jet. This is shown both for "undecayed" (i.e. no brehmsstrahlung within the tracking volume) and for all electrons. The y-cut value decreases with energy as expected, and is orders of magnitude smaller than values expected for hadron jets.

The final plot shows the distribution of electron track p_T (green), total jet energy (orange) and single cluster energy (pink).

Hajrah has made an impressive start with Mokka/Marlin on muon ID. She is looking at the energy deposits and number of hits as muons traverse the calorimeters and will proceed to investigate the muon chambers.

The next meeting will be in two weeks, and they will remain fortnightly until further notice.

Monday, February 11, 2008

Links to plots for today's meeting. Again no time to explain them here, I'll make another post later.

plot 1

plot 2

plot 3

plot 4

Muon id Presentation/ 11/02/08

My presentation is on this web page.
http://www.ph.ed.ac.uk/~vjm/ILC/Presentation.pdf

Minutes of meeting on 28th January

Sorry these are late!

In attendance: Victoria, Roberval, Hajrah, Joel, Clare, Ryan and Mark.

Mark is updating the framework for his analysis to make tagging variables and likelihood cuts (to separate data and background) a la Kuhl-Desch analysis. The likelihood cuts are based on thrust, visible energy and tagging probability.

Clare has generated at ZZ-->l+ l- q qbar events with Pthyia to make stdhep files. She's also been working on the Pandora calibration the LDC_01-05 geometry.

Together Clare and Mark have been generating electron and jet samples for the UG student. UG student has been AWOL, so Helen will help out here.

Clare and Mark agreed that they would read all the LDC emails that are being circulated to check that we are conforming to the official analysis for the LCF CDR.

Roberval has been making a comparison of the results obtained with SimpleCaloDigi and MokkaCaloDigi. MokkaCaloDigi is better, Clare and Roberval witll communicate on this.

Plan for upcoming work:

Hajrah is starting to learn about Marlin and Mokka, and will start generating single muon events this week.

Mark will re-generate the signal at ECM=250 GeV

Thursday, February 7, 2008

CoM Energy

I just received an email from Ron Settles on the ILD optimisation list that included this:

---------------

--The optimum c.m.energy for doing Higgs measurements is
\sqrt{s}=m_Z+m_Higgs+ca.20GeV as has been pointed out many times recently
(and even during Lep days); I think most people have now changed to this
(the Snowmass05 energy was not optimum for m_Higgs=120GeV); it would be
best if all use the optimum energy to make comparisons/combinations
easier.

-----------------

I'm not sure what "ca" means? Does this agree with our CoM energy of 250?

Tuesday, February 5, 2008

Python Script for Pandora Calibration

I have written a python script to run through the Pandora Calibration:

It will run over all of the calibration constants, produce the root files containing the histograms, fit the histograms and find the calibration constant that gives the mean of the fit closest to the correct value.

I've tried to find a sensible way for the code to decide which calibration constants to try next, and to decide when to finish the calibration, but I wouldn't want to guarantee that it gives the best answers.

The middle part of the code creates a Marlin steering file with the chosen calibration constants, and the appropriate slcio input files, runs this through Marlin to produce the histograms, gets the appropriate histogram, fits it, and prints the mean of the fit.

I think this part is useful in itself, even if you are not convinced by the way I have chosen the calibration constants each time.
You could always write a different loop around this bit, that just iterates through a load of calibration constants, maybe.

Anyway, to run the code, first you need to set up pyroot:

export LD_LIBRARY_PATH=$ROOTSYS/lib:$PYTHONDIR/lib:$LD_LIBRARY_PATH
export PYTHONPATH=$ROOTSYS/lib:$PYTHONPATH

This worked with our pretty standard root installation, but if it doesn't there is more info here

The python code is here
It calls a bash script that makes the Marlin steering file.

To run this script use:

python DoPandoraCalib.py

You will need to change the "slciofile" string to your slcio files on lines 100, 110 etc.
You will also need to change the gear file in MakeSteeringFile.sh.
"debug" can be changed:
if debug = 0, only the best calibration constants that the code finds will be printed out.
if debug = 1, some info about the calib consts that are being tried will also be printed out.
if debug = 2, lots of info will be printed, and the histograms will be drawn.

The starting values, and iteration values (lines 95 and 96 etc) for the calib consts will need to be set to something sensible, i've set them to what i think is sensible for LDC01_05Sc.

The script puts all the root files in a directory called 'CalibRootFiles'.

The calibration constants found by the code for LDC01_05Sc are as follows:
CalibrECAL = 63.4, 126.9
CalibrHCAL = 41.25
ECALMIPcalibration = 147.8
HCALMIPcalibration = 34.2
ECALEMMIPToGeV = 0.00675
ECALHadMIPToGeV = 0.00675
HCALEMMIPToGeV = 0.035
HCALHadMIPToGeV = 0.035

Mark Thompson's default consts are:
CalibrECAL = 62.5 123.0
CalibrHCAL = 31.2
ECALMIPcalibration = 171
HCALMIPcalibration = 37.1
ECALEMMIPToGeV = 0.00593
ECALHadMIPToGeV = 0.00593
HCALEMMIPToGeV = 0.026
HCALHadMIPToGeV = 0.026

Monday, January 28, 2008

Calorimeter calibration

Hajrah and Alex, from Edinburgh, found in their analyses a mass distribution of the Higgs quite high and wide when the mass is reconstructed from the b jets. That led us to think if there is a problem with the calibration. As an exercise I ran the PandoraPFACalibrator processor and comparing the processors SimpleCaloDigi and MokkaCaloDigi, using the detector model LDC01_05Sc I found that for the ECAL calibration the energy distribution of 10 GeV photons is fine for both processors. But for the 10 GeV KL using SimpleCaloDigi the energy distribution is wide and peaks at an energy higher than 10 GeV.

Here are the plots for the energy distribution of the

The solid black line comes from MokkaCaloDigi; dashed red line comes from SimpleCaloDigi.

We are using SimpleCaloDigi in our reconstruction. That might be the reason of the wrong mass distribution of the Higgs candidates. The next step is to check the calibration constants for LDC01Sc, as this is the model used for our samples, change to MokkaCaloDigi and see if the results improve.

Tuesday, January 15, 2008

Minutes of Meeting 15/1/08

Our first regular meeting was held at 4pm on Monday. The main points were:

Mark's Plots (see previous entry): number of jets and mass of the Z as as function of K_T cut. Things look sensible in the limits, but he has discovered a bug that can double count if more than one MC particle is in the same jet (this gives the peaks at m_Z=180 GeV etc.)
Energy: it was agreed to make future samples with c.of.m energy 250 GeV as specified in the benchmarks
Effort: Mark is working full time, with support from Clare. Victoria and Robervaal are working on other things at the moment but should start ramping up soon. Hajrah is coming up to speed, and will start looking at muon ID. There is also an undergraduate at Bristol who will look at electron ID, and an undergraduate at Edinburgh who will be asked to make a short presentation soon.
Next Steps: Mark's priority will be to assemble all of the machinery for a crude analysis, so we can see where the work needs to be done in tuning etc.
Future Meetings: We will switch to 1pm on Mondays in future. Next meeting is next week.

Monday, January 14, 2008

Plots for the meeting.

Here are some plots I'm going to talk about in today's meeting. I'll write something about them in the blog later, but the meeting starts in 5 minutes so I don't have time now.
Things as a function of kT up to 0.01 and up to 1.5. Note that the second set of plots has a double counting error if the positron and the electron are in the same jet (so the Z mass plot is double what it should be), I'm running up a corrected plot but it's not finished yet.

Addendum 15/Jan/08
Here's the corrected plot for the kT values up to 1.5, and also one for kT values up to 5*10^-4.
The reconstructed Z mass peak is still very low; I guess it will peak when every particle is in its own jet since the extra reconstructed neutrals are double counting energy (the charged particle gets its energy from the track momentum, so any counting of its clusters is double counting). You can sort of see this because the plot seems to spread higher more so than lower, although you can only see it for the first kT bin; I'm having "issues" getting Paida to rotate the plots appropriately.
I'll reverse the plots, i.e plot the kT as a function of the other stuff which should give me a decent kT cut to work with. I'll try and get some rudimentary electron selection done by the end of the week and then move on to the rest of the analysis framework, leaving someone else to worry about the details.

Wednesday, January 2, 2008

PandoraPFA Calibration

I had a go at the Pandora calibration, it's well documented and seems straight forward enough. I've put a script for a Grid job in the subversion repository to create all the samples necessary so it can be repeated if the detector changes (https://svn.phy.bris.ac.uk/svn/ilc_z_higgs/MarksStuff/PandoraCalibrationEventGeneration).
The files I generated for LDC00SC are in "/grid/ilc/users/phmag/LCIOOutput/PandoraCalibration".

I've not done any fitting with Root before, so for a first pass just centred the histograms by eye.
The constants I came up with are:

MokkaCaloDigi:
CalibrECAL=27, 81
CalibrHCAL=27.3

PandoraPFA:
ECALMIPcalibration=230
HCALMIPcalibration=31.5
ECALEMMIPToGeV=0.0045
ECALHadMIPToGeV=0.0045
HCALEMMIPToGeV=0.0353
HCALHadMIPToGeV=0.0353

I'll re-run the reconstruction of all the Z Higgs samples with these constants tomorrow.