• Help
    Discussion forum
    Search tips
  • About
    CERN Open Data
    ALICE
    ATLAS
    CMS
    DELPHI
    LHCb
    OPERA
    TOTEM
    Glossary

About CMS

Documentation About


The Compact Muon Solenoid (CMS) is one of the large particle detectors at CERN's Large Hadron Collider. The CMS Collaboration consists of more than 4000 scientists, engineers, technicians and students from around 240 institutes and universities from more than 50 countries. You can find more information about the CMS detector on the official CMS website.

You can find usage instructions and suggestions of CMS Open Data for different scopes in:

  • Guide page to education use of CMS Open Data
  • Guide page to research use of CMS Open Data and a separate CMS Open Data guide.

This page gives a brief overview of CMS Open Data contents:

  • CMS Data and analysis tools
  • Primary and simulated datasets
  • Disclaimer
  • Other CMS open data
  • Policies

CMS Data and analysis tools

The following are provided through this portal:

  • Downloadable datasets
    • Primary datasets: full reconstructed collision data with no other selections. The data here are referred to as "reconstructed data"; fragmented data from various sub-detectors are processed or "reconstructed" to provide coherent information about individual physics objects such as electrons or particle jets.
    • Simulation data
    • Examples of simplified datasets derived from the primary ones for use in different applications and analyses
  • Tools
    • Downloadable container images with the CMS software environment through which the datasets can be accessed
    • Alternatively, a downloadable Virtual Machine (VM) image with the CMS software environment
    • Getting started instructions for reading and processing primary data in the AOD format (Run 1), MiniAOD format (Run 2), or NanoAOD format (Run 2).
    • Ready-to-use online applications, such as an event display and simple histogramming software
    • Source code for the various examples and applications, available in the CMS software collection
  • Guides
    • Set of topical guide pages
    • A omprehensive set of instructions is being collected in a separate CMS Open Data guide with links to the latest tutorials.

Primary and simulated datasets

Collision data in the primary datasets are typically in a format known as AOD or Analysis Object Data, while simulated data are in a format called AODSIM. Beginning in Run 2, smaller data formats called MiniAOD and NanoAOD were developed in CMS to implement common physics object processing and remove information that not often needed for analysis.

AOD(SIM) and MiniAOD(SIM) files

AOD/AODSIM files are provided for Run 1 primary datasets and contain the information that is needed for analysis:

  • all the high-level physics objects (such as muons, electrons, etc.);
  • tracks with associated hits, calorimetric clusters with associated hits, vertices;
  • candidate particles created by the Particle Flow algorithm;
  • information about event selection (triggers), data needed for further selection and identification criteria for the physics objects.

See the Getting Started page for AOD data to learn more about analyzing AOD files.

Starting from Run 2 (2015), MiniAOD/MiniAODSIM files are provided. These files contain similar information to AOD, but physics objects are processed to include more identification and selection information within a lighter C++ object, transverse momentum thresholds for storing objects are increased, and some lower-level information has been removed. MiniAOD datasets are appoximately one tenth of the size of AOD datasets. More information about MiniAOD:

  • Mini-AOD: A New Analysis Data Format for CMS
  • MiniAOD analysis documentation
  • Getting Started with CMS MiniAOD data

AOD and MiniAOD files do not contain the final event interpretation with a simple list of particles. The files can be read in ROOT, but they cannot be opened (and understood) as simple data tables. A file typically contains several instances of the same physics object (i.e. a jet reconstructed with different algorithms), and some physics objects may be "double-counted" (i.e. a physics object may appear as a single object of its own type, but it may also be part of a jet).

Additional knowledge is needed to define a "good" physics object, and this definition can be different in each analysis. Only the runs that are validated by data quality monitoring should be used in any analysis. The list of the validated runs is provided.

NanoAOD(SIM) files

Starting from data collected in 2016, datasets in NanoAOD format are provided alongside MiniAOD. Only a limited set of observables for each physics object is kept, with limited numerical precision. For example, detector information is typically dropped in favor of pre-computed identification algorithm results. The Particle Flow candidates are also dropped, since they are primarily used as inputs to higher-level physics object reconstruction. The NanoAOD format is about 20 times smaller than MiniAOD, or about 200 times smaller than AOD. NanoAOD files can be read in ROOT as a basic TTree containing standard data types. More information about NanoAOD:

  • The NanoAOD event data format in CMS
  • NanoAOD file documentation
  • Getting Started with CMS NanoAOD data

NanoAOD files may still contain several instances of the same physics object (i.e. a jet reconstructed with different algorithms), and some physics objects may be "double-counted" (i.e. a physics object may appear as a single object of its own type, but it may also be part of a jet).

Additional knowledge is needed to define a "good" physics object, and this definition can be different in each analysis. Only the runs that are validated by data quality monitoring should be used in any analysis. The list of the validated runs is provided.

RECO files

Some datasets, such as those containing heavy-ion data, are provided in a format called RECO, which contains more information than the AOD format. This is done when the original analyses by the CMS collaboration were performed using this particular format.

Raw data

Small samples of raw data are also provided.

Disclaimer

  • The open data are released under the Creative Commons CC0 waiver. Neither CMS nor CERN endorses any works, scientific or otherwise, produced using these data, even if available on, or linked from, this portal.
  • All datasets will have a unique DOI that you are requested to cite in any applications or publications.
  • Despite being processed, the high-level primary datasets remain complex and selection criteria need to be applied in order to analyse them, requiring some understanding of particle physics and detector functioning. The data cannot be viewed in simple data tables for spreadsheet-based analyses.
  • No further development is foreseen for either the data released or the software version needed to analyse them.
    • The methods have evolved since the released data were recorded.
    • More advanced techniques are used with recent data but the software is not compatible out-of-the-box with older data samples.
  • The simulated data are not a full set of simulations, but only those datasets that have been reprocessed with a software release compatible with the respective collision data.
    • The release of 2010 data is accompanied by a small set of simulated data.
    • The release of 2011 data includes some simulated data, limited to those datasets that were reprocessed with a software release compatible with the 2012 collision data.
    • The release of 2012 data includes a larger sample of simulated data. A part of 2012 simulated data is released with the bibliographic information content only, and these datasets will be made available online on demand.
    • The release of 2013 heavy-ion-related data includes simulated data corresponding to different collision types and centre-of mass energies.
    • The release of 2015 data includes a large collection of simulated data, reprocessed with a software release compatible with the 2015 collision data, but it may still happen that some simulated data did not make it to this reprocessing and are therefore not available in this collection.
    • The release of 2016 data includes a large collection of simulated data, reprocessed with a software release compatible with the 2016 collision data.
  • If you are interested in joining the CMS Collaboration, please read How to join CMS.

Other CMS open data

  • All CMS publications are open access.
  • Some of the papers also include open data in the form of additional tables, plots, graphs and Rivet packages.

Policies

  • Data preservation and open access policy
  • Papers by CMS members using public data [internal]
ALICE experiment
ATLAS experiment
CMS experiment
DELPHI experiment
LHCb experiment
OPERA experiment
PHENIX experiment
TOTEM experiment
© CERN, 2014–2025 ·
Terms of Use ·
Privacy Policy ·
Help ·
GitHub ·
Twitter ·
Email
Powered by Invenio
Open Data Portal v0.3.0
CERN