The CERN Open Data Policy reflects values that have been enshrined in the CERN Convention for more
than sixty years that were reaffirmed in the European Strategy for Particle Physics (2020), and aims to
empower the LHC experiments to adopt a consistent approach towards the openness and preservation
of experimental data. Making data available responsibly (applying FAIR standards), at different levels
of abstraction and at different points in time, allows the maximum realisation of their scientific potential
and the fulfillment of the collective moral and fiduciary responsibility to member states and the broader
global scientific community. CERN understands that in order to optimise reuse opportunities, immediate
and continued resources are needed. The level of support that CERN and the experiments will be able
to provide to external users will depend on available resources.
This policy relates to the data collected by the LHC experiments, for the main physics programme of the
LHC — high-energy proton–proton and heavy-ion collision data. The foreseen use cases of the Open
Data include reinterpretation and reanalysis of physics results, education and outreach, data analysis
for technical and algorithmic developments and physics research. The Open Data will be released
through the CERN Open Data Portal which will be supported by CERN for the lifetime of the data. The
data will be tailored to the different uses, and will be made available in formats defined by each
experiment that afford a range of opportunities for long-term use, reuse and preservation. In general,
four levels of complexity of HEP data have been identified by the Data Preservation and Long Term
Analysis in High Energy Physics (DPHEP) Study Group, which serve varying audiences and imply a
diversity of openness solutions and practices.
Published Results (Level 1) Policy: Peer-reviewed publications represent the primary scientific output
from the experiments. In compliance with the CERN Open Access Policy, all such publications are
available with Open Access, and so are available to the public. To maximise the scientific value of their
publications, the experiments will make public additional information and data at the time of
publication, stored in collaboration with portals such as HEPData, with selection routines stored in
specialised tools. The data made available may include simplified or full binned likelihoods, as well as
unbinned likelihoods based on datasets of event-level observables extracted by the analyses.
Reinterpretation of published results is also made possible through analysis preservation and direct
collaboration with external researchers.
Outreach and Education (Level 2) Policy: For the purposes of education and outreach, dedicated
subsets of data are used, selected and formatted to provide rich samples to maximise their educational
impact, and to facilitate the easy use of the data. These data are released with a schedule and scope
determined by each experiment. The data are provided in simplified, portable and self-contained
formats suitable for educational and public understanding purposes; but are not intended nor adequate
for the publication of scientific results. Lightweight environments to allow the easy exploration of these
data may also be provided. CERN experiments will make data of such high level of abstraction available,
accessible through the CERN Open Data Portal.
Reconstructed Data (Level 3) Policy: The LHC experiments will release calibrated reconstructed data
with the level of detail useful for algorithmic, performance and physics studies. The release of these
data will be accompanied by provenance metadata, and by a concurrent release of appropriate
simulated data samples, software, reproducible example analysis workflows, and documentation.
Virtual computing environments that are compatible with the data and software will be made available.
The information provided will be sufficient to allow high-quality analysis of the data including, where
practical, application of the main correction factors and corresponding systematic uncertainties related
to calibrations, detector reconstruction and identification. A limited level of support for users of the
Level 3 Open Data will be provided on a best-effort basis by the collaborations.
Public data releases will occur periodically, following an appropriate latency period to allow thorough
understanding of the data, the reconstruction and calibrations, as well as to allow time for the scientific
exploitation of the data by the collaboration. The size of the released datasets will be commensurate
with the total amount of data collected of similar type, with the aim to commence data releases within
five years of the conclusion of the run period. Data may be withheld by an experiment if there are active
analyses ongoing. Full datasets will be made available at the close of the collaboration.
The data will be released from the CERN Open Data Portal under the Creative Commons CC0 waiver,
and will be identified with persistent data identifiers, and the data must be cited through these
identifiers. Similarly, appropriate acknowledgements of the experiment(s) should be included in
publications released using such data, and the publications made clearly distinguishable from those
released by the collaboration. Any scientific claims in such publications are the responsibility of their
authors and not of the experiments. It is expected that scientific results released using Open Data follow
best scientific practices. The experiments may impose rules related to the use of the data
by members of their respective collaborations.
External authors should be aware that they will not have access to the vast amount of tacit knowledge
built up within the LHC collaborations over the decades of design, construction and operation of the
experimental apparatus. To allow external scientists to fully benefit from all the data, knowledge and
tools, the collaborations may offer appropriate association programmes.
Raw Data (Level 4) Policy: It is not practically possible to make the full raw data-set from the LHC
experiments usable in a meaningful way outside the collaborations. This is due to the complexity of the
data, metadata and software, the required knowledge of the detector itself and the methods of
reconstruction, the extensive computing resources necessary and the access issues for the enormous
volume of data stored in archival media. It should be noted that, for these reasons, general direct access
to the raw data is not even available to individuals within the collaboration, and that instead the
production of reconstructed data (i.e. Level-3 data) is performed centrally. Access to representative
subsets of raw data—useful for example for studies in the machine learning domain and beyond—can
be released together with Level-3 formats, at the discretion of each experiment.
-- CERN Open Data Policy Working Group (with representatives from CERN Management, ATLAS, ALICE, CMS, LHCb and TOTEM Collaborations, CERN Information Technology and Scientific Information Services)
 European Strategy Group (2020), ‘2020 Update of the European Strategy for Particle Physics’.
 Data management plans are defined by the LHC experiments to address the long-term preservation of internal
data products. See: Akopov et al., Status report of the DPHEP Study Group: Towards a global effort for
sustainable data preservation in high energy physics. arXiv preprint arXiv:1205.4667 (2012).