• Help
    Discussion forum
    Search tips
  • About
    CERN Open Data
    ALICE
    ATLAS
    CMS
    DELPHI
    LHCb
    OPERA
    TOTEM
    Glossary

ATLAS Top Tagging Open Data Set

ATLAS collaboration

Cite as: ATLAS collaboration (2022). ATLAS Top Tagging Open Data Set. CERN Open Data Portal. DOI:10.7483/OPENDATA.ATLAS.FG5F.96GA

Dataset Derived Datascience ATLAS CERN-LHC


Description

Boosted top tagging is an essential binary classification task for experiments at the Large Hadron Collider (LHC) to measure the properties of the top quark. The ATLAS Top Tagging Open Data Set is a publicly available data set for the development of Machine Learning (ML) based boosted top tagging algorithms. The data are split into two orthogonal sets, named train and test and stored in the HDF5 file format, containing 42 million and 2.5 million jets respectively. Both sets are composed of equal parts signal (jets initiated by a boosted top quark) and background (jets initiated by light quarks or gluons). For each jet, the data set contains:

  • The four vectors of constituent particles
  • 15 high level summary quantities evaluated on the jet
  • The four vector of the whole jet
  • A training weight
  • A signal (1) vs background (0) label.

There is one rule in using this data set: the contribution to a loss function from any jet should always be weighted by the training weight. Apart from this a model should separate the signal jets from background by whatever means necessary.

Updated on July 26th 2024. This dataset has been superseeded by a new dataset which also includes systematic uncertainties. Please use the new dataset instead of this one.

Dataset characteristics

2 files. 129.2 GiB in total.

External links

ATLAS publication note ATL-PHYS-PUB-2022-039

How can you use these data?

The detailed explanation of this dataset, together with usage code examples, is available in the following source code repository:

ATLAS top tagging open data documentation and examples


      

Files and indexes

Disclaimer

These open data are released under the Creative Commons Zero v1.0 Universal license.

Logo CC0-1.0

Neither the experiment(s) ( ATLAS ) nor CERN endorse any works, scientific or otherwise, produced using these data.

This release has a unique DOI that you are requested to cite in any applications or publications.

ALICE experiment
ATLAS experiment
CMS experiment
DELPHI experiment
LHCb experiment
OPERA experiment
PHENIX experiment
TOTEM experiment
© CERN, 2014–2025 ·
Terms of Use ·
Privacy Policy ·
Help ·
GitHub ·
Twitter ·
Email
Powered by Invenio
Open Data Portal v0.4.3
CERN