Jomhari, Nur Zulaiha ; Geiser, Achim ; Bin Anuar, Afiq Aizuddin
Cite as: Jomhari, Nur Zulaiha; Geiser, Achim; Bin Anuar, Afiq Aizuddin; (2017). Higgs-to-four-lepton analysis example using 2011-2012 data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.JKB8.RR42
Software Analysis Workflow CMS CERN-LHC
This research level example is a strongly simplified reimplementation of parts of the original CMS Higgs to four lepton analysis published in Phys.Lett. B716 (2012) 30-61, arXiv:1207.7235.
The published reference plot which is being approximated in this example is https://inspirehep.net/record/1124338/files/H4l_mass_3.png. Other Higgs final states (e.g. Higgs to two photons), which were also part of the same CMS paper and strongly contributed to the Higgs boson discovery, are not covered by this example.
The example consists of different levels of complexity. The highest level of this example addresses users who feel they have at least some minimal understanding of the content of this paper and of the meaning of this reference plot, which can be reached via (separate) educational exercises. The lower levels might also be interesting for educational applications. The example requires a minimal acquaintance with the linux operating system and the ROOT analysis tool.
The example uses legacy versions of the original CMS data sets in the CMS AOD, which slightly differ from the ones used for the publication due to improved calibrations. It also uses legacy versions of the corresponding Monte Carlo simulations, which are again close to, but not identical to, the ones in the original publication. These legacy data and MC sets listed below were used in practice, exactly as they are, in many later CMS publications.
Since according to the CMS Open Data policy the fraction of data which are public (and used here) is only 50% of the available LHC Run I samples, the statistical significance is reduced with respect to what can be achieved with the full dataset. However, the original paper Phys.Lett. B716 (2012) 30-61, arXiv:1207.7235, was also obtained with only part of the Run I statistics, roughly equivalent to the luminosity of the public sets, but with only partial statistical overlap.
The provided analysis code recodes the spirit of the original analysis and recodes many of the original cuts on original data objects, but does not provide the original analysis code itself. Also, for the sake of simplicity, it skips some of the more advanced analysis methods of the original paper. Nevertheless, it provides a qualitative insight about how the original result was obtained. In addition to the documented core results, the resulting root files also contain many undocumented plots which grew as a side product from setting up this example and earlier examples. The significance of the Higgs 'excess' is about 2 standard deviations in this example, while it was 3.2 standard deviations in this channel alone in the original publication. The difference is attributed to the less sophisticated background suppression. In more recent (not yet public) CMS data sets with higher statistics the signal is observed in a preliminary analysis with more than 5 standard deviations in this channel alone CMS-PAS-HIG-16-041.
The analysis strategy is the following: Get the 4mu and 2mu2e final states from the DoubleMuParked datasets and the 4e final state from the DoubleElectron dataset. This avoids double counting due to trigger overlaps. All MC contributions except top use data-driven normalization: The DY (Z/gamma^*) contribution is scaled to the Z peak. The ZZ contribution is scaled to describe the data in the independent mass range 180-600 GeV. The Higgs contribution is scaled to describe the data in the signal region. The (very small) top contribution remains scaled to the MC generator cross section.
The example uses legacy versions of the original CMS datasets in the AOD format, which slightly differ from the ones used for the original publication due to improved calibrations. It also uses legacy versions of the corresponding Monte Carlo simulations, which are again close to, but not identical to, the ones in the original publication. These legacy data and MC sets listed below were used in practice, exactly as they are, in many later CMS publications.
/DoubleElectron/Run2011A-12Oct2013-v1/AOD
/DoubleMu/Run2011A-12Oct2013-v1/AOD
/ZZTo4mu_mll4_7TeV-powheg-pythia6/Summer11LegDR-PU_S13_START53_LV6-v1/AODSIM
/ZZTo4e_mll4_7TeV-powheg-pythia6/Summer11LegDR-PU_S13_START53_LV6-v1/AODSIM
/ZZTo2e2mu_mll4_7TeV-powheg-pythia6/Summer11LegDR-PU_S13_START53_LV6-v1/AODSIM
/SMHiggsToZZTo4L_M-125_7TeV-powheg15-JHUgenV3-pythia6/Summer11LegDR-PU_S13_START53_LV6-v1/AODSIM
/DYJetsToLL_M-50_7TeV-madgraph-pythia6-tauola/Summer11LegDR-PU_S13_START53_LV6-v1/AODSIM
/DYJetsToLL_M-10To50_TuneZ2_7TeV-pythia6/Summer11LegDR-PU_S13_START53_LV6-v1/AODSIM
/TTTo2L2Nu2B_7TeV-powheg-pythia6/Summer11LegDR-PU_S13_START53_LV6-v1/AODSIM
/DoubleMuParked/Run2012B-22Jan2013-v1/AOD
/DoubleMuParked/Run2012C-22Jan2013-v1/AOD
/DoubleElectron/Run2012B-22Jan2013-v1/AOD
/DoubleElectron/Run2012C-22Jan2013-v1/AOD
/ZZTo4mu_8TeV-powheg-pythia6/Summer12_DR53X-PU_RD1_START53_V7N-v1/AODSIM
/ZZTo4e_8TeV-powheg-pythia6/Summer12_DR53X-PU_RD1_START53_V7N-v2/AODSIM
/ZZTo2e2mu_8TeV-powheg-pythia6/Summer12_DR53X-PU_RD1_START53_V7N-v2/AODSIM
/SMHiggsToZZTo4L_M-125_8TeV-powheg15-JHUgenV3-pythia6/Summer12_DR53X-PU_S10_START53_V19-v1/AODSIM
/TTbar_8TeV-Madspin_aMCatNLO-herwig/Summer12_DR53X-PU_S10_START53_V19-v2/AODSIM
In addition to the instructions below which guide you through the example in detail, a github repository based on this original example is also provided. The root files needed for the Level 3 exercise can be found here.
There are four levels of increasing complexity for this example:
mkdir rootfiles
cd rootfiles
and download the preproduced *.root histogram files given in this record for all relevant samples to this directoryrootfiles
, type wget http://opendata.web.cern.ch/record/5501/files/rootfilelist.txt
and then wget -i rootfilelist.txt
root
, and on the root promt, type TBrowser t
, then double-click on the relevant fileroot -l M4Lnormdatall.cc
file->Quit ROOT
or, on the root [] prompt, type .q
Demo/DemoAnalyzer/
directory, which is created following Step 2: How to test and validate, replace BuildFile.xml
by the version downloaded from this recordDemo/Demoanalyzer/src
subdirectoryDemo/DemoAnalyzer/
, recompile scram b
mkdir datasets
and change to this directory cd datasets
rootfiles
and download all the level 2 root files to this directory (see level 2)cmsRun demoanalyzer_cfg_level3data.py
will produce output file DoubleMuParked2012C_10000_Higgs.root
containing 1 Higgs candidate from the datacmsRun demoanalyzer_cfg_level3MC.py
will produce output file Higgs4L1file.root
containing the Higgs signal distributions with reduced statisticsrootfiles
directory, together with the predefined filesmv DoubleMuParked2012C_10000_Higgs.root rootfiles/.
mv Higgs4L1file.root rootfiles/.
cd rootfiles
and download the macro M4Lnormdatall_lvl3.cc to this directoryroot -l M4Lnormdatall_lvl3.cc
file->Quit ROOT
or, on the root [] prompt, type .q
datasets
directory (you can find the links to the datasets in this record)datasets
directory (in which you should already have the 2012 one)List_indexfile.txt
to the MCsets
directory (after having created it)cmsRun demoanalyzer_cfg_level4...
) sequentially on all the input samples listed in List_indexfile.txt
, i.e. produce all root output files yourself. If you have access to a computer farm with local support for the installation of the CMS software (the Open Data team can only provide support for the single virtual machine mode), you may also run the analysis in parallel on different CPUs, correspondingly speeding up the result.
GNU General Public License (GPL) version 3