"NanoAOD" is a new data format created by CMS for Run 2 physics analyses. It is available for data collected in 2016 and later. In contrast to the AOD) and MiniAOD formats, NanoAOD is not based on CMS-specific C++ classes, but rather stores data in standard C++ types within a ROOT TTree
. You therefore do not need to use the CMS Virtual Machine or docker container to analyze NanoAOD data. NanoAOD can be analyzed using the ROOT program and/or python libraries capable of interpreting the ROOT's TTree
structure.
uproot
and awkward
). A python docker container with access to all the required packages is also available.NanoAOD data can be accessed by:
cernopendata-client
. Once you know the path to a NanoAOD file, you can access it over the network using ROOT. The following example is done inside the ROOT docker container:$ root -l root://eospublic.cern.ch//eos/opendata/cms/Run2016H/DoubleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/2510000/127C2975-1B1C-A046-AABF-62B77E757A86.root
root [0]
Attaching file root://eospublic.cern.ch//eos/opendata/cms/Run2016H/DoubleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/2510000/127C2975-1B1C-A046-AABF-62B77E757A86.root as _file0...
Warning in <TClass::Init>: no dictionary for class edm::Hash<1> is available
Warning in <TClass::Init>: no dictionary for class edm::ParameterSetBlob is available
Warning in <TClass::Init>: no dictionary for class edm::ProcessHistory is available
Warning in <TClass::Init>: no dictionary for class edm::ProcessConfiguration is available
Warning in <TClass::Init>: no dictionary for class pair<edm::Hash<1>,edm::ParameterSetBlob> is available
(TFile *) 0x561a48eaade0
Within a ROOT script, the TFile::Open
method can be used to access the file:
root [2] TFile *f = TFile::Open("root://eospublic.cern.ch//eos/opendata/cms/Run2016H/DoubleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/2510000/127C2975-1B1C-A046-AABF-62B77E757A86.root")
Each NanoAOD file contains several TTree
objects containing different information about the data. This file format is documented in the CMS SWGuide Workbook. The most significant for data analysis is the Events
tree, followed by the Runs
tree.
$ root -l root://eospublic.cern.ch//eos/opendata/cms/Run2016H/DoubleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/2510000/127C2975-1B1C-A046-AABF-62B77E757A86.root
root [0]
Attaching file root://eospublic.cern.ch//eos/opendata/cms/Run2016H/DoubleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/2510000/127C2975-1B1C-A046-AABF-62B77E757A86.root as _file0...
root [1] _file0->ls()
TNetXNGFile** root://eospublic.cern.ch//eos/opendata/cms/Run2016H/DoubleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/2510000/127C2975-1B1C-A046-AABF-62B77E757A86.root
TNetXNGFile* root://eospublic.cern.ch//eos/opendata/cms/Run2016H/DoubleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/2510000/127C2975-1B1C-A046-AABF-62B77E757A86.root
KEY: TObjString tag;1 Collectable string class
KEY: TTree Events;1 Events
KEY: TTree LuminosityBlocks;1 LuminosityBlocks
KEY: TTree Runs;1 Runs
KEY: TTree MetaData;1 Job metadata
KEY: TTree ParameterSets;1 Parameter sets
Within each tree, different "branches" represent quantities of interest whose information is stored for each collision event (real or simulated) in the dataset. When NanoAOD files are produced, the branches are all given short documentation strings to aid the user that are collected into a webpage:
Let's see what a NanoAOD file looks like from the ROOT command line, using the docker container. Follow the commands shows above to open a DoubleMuon NanoAOD file using XrootD.
root [6] Events->Show(2) # prints tree contents for event #2.
======> EVENT:2
run = 281616
luminosityBlock = 7
event = 386901
...
...
nMuon = 2
Muon_dxy = -0.00117493,
0.00658798
Muon_dxyErr = 0.0996094,
0.0996094
Muon_dxybs = -0.00116634,
0.00656891
Muon_dz = -0.225708,
0.487549
Muon_dzErr = 4.6875,
5.1875
Muon_eta = 0.902222,
0.339417
...etc...
This shows the basic structure of the NanoAOD file. For event #2, some branches hold a single value for the event (nMuon = 2
) while other branches hold an array of values for the event.
In this event the pseudorapidity (eta
) values for the 2 muons are stored in an array with 2 elements.
The first step to analyzing NanoAOD data is to build up an understanding of what the different branches mean for physics, and how typical CMS physics analyses use these quantities to select interesting events and reconstruct particle decays. Several resources are available to help:
The second step is to determine a software method for performing validated event filtering, event selection and subsequent analysis. Note: Collision data in NanoAOD format has not been filtered to contain only the validated run segments. A json
file containing validated run segments can be found in the links on each collision dataset record.
As we've mentioned, NanoAOD data is stored using ROOT's TTree
object. The data can be explored interactively by drawing histograms of various tree branches, even applying weights and cuts to the histograms. In the ROOT container, start the VNC server for X11 graphics support. The default password is cms.cern
for either a VNC viewer or the web browser. Then open a NanoAOD file using XRootD and follow these example commands to draw histograms.
$start_vnc
Starting VNC server...
[1] 46
[2] 50
[3] 51
[4] 52
VNC connection points:
VNC viewer address: 127.0.0.1::5901
HTTP access: http://127.0.0.1:6080/vnc.html
$ root -l root://eospublic.cern.ch//eos/opendata/cms/Run2016H/DoubleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/2510000/127C2975-1B1C-A046-AABF-62B77E757A86.root
root [0]
Attaching file root://eospublic.cern.ch//eos/opendata/cms/Run2016H/DoubleMuon/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/2510000/127C2975-1B1C-A046-AABF-62B77E757A86.root as _file0...
Warning in <TClass::Init>: no dictionary for class edm::Hash<1> is available
Warning in <TClass::Init>: no dictionary for class edm::ParameterSetBlob is available
Warning in <TClass::Init>: no dictionary for class edm::ProcessHistory is available
Warning in <TClass::Init>: no dictionary for class edm::ProcessConfiguration is available
Warning in <TClass::Init>: no dictionary for class pair<edm::Hash<1>,edm::ParameterSetBlob> is available
(TFile *) 0x55f88402e060
root [1] TTree *t = (TTree*)_file0->Get("Events")
(TTree *) 0x55f8837d3910
root [2] t->Draw("Muon_pt","Muon_pt < 2000") # draws pT of all muons below 2 TeV
Info in <TCanvas::MakeDefCanvas>: created default TCanvas with name c1
(long long) 4804238
root [3] c1->SetLogy(1) # log scale y-axis
root [4] t->Draw("Muon_pt","Muon_tightId > 0","same") # draws pT of high-purity muons
(long long) 3340305
root [5] t->Draw("Muon_pt","(0.85)*(Muon_isTight > 0)","same") # apply an arbitrary weight
(long long) 3340305
root [6] .q
Before exiting the ROOT session, your histogram canvas should look something like this:
NanoAOD-tools
The NanoAOD-tools software provides a toolkit to perform operations on NanoAOD files outside the CMSSW software environment. This program can be used to:
Skimming data events that fail the validation is a critical feature, so let's see how to do it by following instructions from the NanoAOD-tools repository. This example will be performed in the ROOT docker container using files from the Tau 2016H dataset. The instructions here download files via git clone
, but it is also possible to download a .zip
file of the NanoAOD-tools repository from the record page
$ git clone https://github.com/cms-opendata-analyses/nanoAOD-tools.git NanoAODTools
Cloning into 'NanoAODTools'...
remote: Enumerating objects: 4269, done.
remote: Counting objects: 100% (110/110), done.
remote: Compressing objects: 100% (38/38), done.
remote: Total 4269 (delta 57), reused 93 (delta 57), pack-reused 4159
Receiving objects: 100% (4269/4269), 113.08 MiB | 20.57 MiB/s, done.
Resolving deltas: 100% (2230/2230), done.
Updating files: 100% (370/370), done.
$ cd NanoAODTools/
/NanoAODTools$ bash standalone/env_standalone.sh build
Build directory created, please source again standalone/env_standalone.sh without the build argument.
/NanoAODTools$ source standalone/env_standalone.sh
Standalone environment set.
Now we will download the validated runs file that is linked from the Tau dataset record page:
$ wget https://opendata-qa.cern.ch/record/14220/files/Cert_271036-284044_13TeV_Legacy2016_Collisions16_JSON.txt
--2024-01-24 20:01:12-- https://opendata.cern.ch/record/14220/files/Cert_271036-284044_13TeV_Legacy2016_Collisions16_JSON.txt
Resolving opendata-qa.cern.ch (opendata-qa.cern.ch)... 188.185.10.243, 188.185.35.172, 188.185.22.9
Connecting to opendata-qa.cern.ch (opendata-qa.cern.ch)|188.185.10.243|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11686 (11K) [text/plain]
Saving to: ‘Cert_271036-284044_13TeV_Legacy2016_Collisions16_JSON.txt’
Cert_271036-284044_13TeV_Legacy2016_Collisions 100%[===============================================>] 11.41K --.-KB/s in 0s
2024-01-24 20:01:13 (126 MB/s) - ‘Cert_271036-284044_13TeV_Legacy2016_Collisions16_JSON.txt’ saved [11686/11686]
NanoAOD-tools is based on a "postprocessor" script that can perform different options based on the options provided. To skim the file according to the validated runs json file
we will use the -J
option. In the following example, the 2 GB NanoAOD file has been downloaded locally using xrdcp
to speed up processing time. To process the file without downloading it, use the full root://eospublic
file path shown in the xrdcp
command when running nano_postproc.py
.
$ xrdcp root://eospublic.cern.ch//eos/opendata/cms/Run2016H/Tau/NANOAOD/UL2016_MiniAODv2_NanoAODv9-v1/260000/03DD0FB3-219B-C54D-A476-EE8937CC214E.root .
[2.156GB/2.156GB][100%][==================================================][11.5MB/s]
$ python scripts/nano_postproc.py -J Cert_271036-284044_13TeV_Legacy2016_Collisions16_JSON.txt ./ 03DD0FB3-219B-C54D-A476-EE8937CC214E.root
Will write selected trees to ./
TClass::Init:0: RuntimeWarning: no dictionary for class edm::Hash<1> is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ParameterSetBlob is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ProcessHistory is available
TClass::Init:0: RuntimeWarning: no dictionary for class edm::ProcessConfiguration is available
TClass::Init:0: RuntimeWarning: no dictionary for class pair<edm::Hash<1>,edm::ParameterSetBlob> is available
Pre-select 2158176 entries out of 2160343 (99.90%)
Selected 2158176 / 2160343 entries from 03DD0FB3-219B-C54D-A476-EE8937CC214E.root (99.90%)
Done ./03DD0FB3-219B-C54D-A476-EE8937CC214E_Skim.root
Total time 4721.5 sec. to process 2160343 events. Rate = 457.6 Hz.
As before, the TClass
warnings from ROOT can be safely ignored. The NanoAOD-tools README page shows more options for this command and discusses their "module" format for applying corrections, performing further event selection, and storing user analysis variables in the file.
The ROOT Trees within NanoAOD files have been a common HEP data structure for many years, and classic "event loop" methods (primarily written in C++) are an effective and understandable model for analyzing NanoAOD. However, as the overall CMS data size has grown, "columnar" analysis methods have become more and more popular within CMS, and can reduce the computing resources needed for the analysis. The core message is that NanoAOD is extremely flexible!
skim.cxx
scriptto select events and reconstruct a particle. Some example analyses in the NanoAODRun1Examples repository also use RDataFrame, in both C++ and Python.CMS expects that approximately 50% of physics analyses would be able to use only the physics and conditions information stored within NanoAOD files. Other analyses may prefer to analyze the MiniAOD format, which contains more information per event, albeit in the complex EDM class format. See the Getting Started with CMS MiniAOD Data page for more details on analyzing MiniAOD.
One specific collection of information that users may wish to add to NanoAOD is the collection of "Particle Flow Candidates". Some collision data is available in this enriched format, called "NanoAOD-PF". A PFNano producer tool is also available with instructions showing how a MiniAOD file can be processed to produce a NanoAOD-PF ROOT file.
That's it! Hope you enjoyed this introduction. Feel free to play around with the rest of the data and write your own analysis code. Learn more in the CMS Open data guide.