CNNPixelSeedsProducerTool - Example workflow using ML in track reconstruction with CMS 2018 simulated data

Di Florio, Adriano ; Pantaleo, Felice ; Pierini, Maurizio

Cite as: Di Florio, Adriano; Pantaleo, Felice; Pierini, Maurizio; (2019). CNNPixelSeedsProducerTool - Example workflow using ML in track reconstruction with CMS 2018 simulated data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.T709.ZN5Z

Data recorded in 2019. Published in 2019.

Software Tool Datascience CMS CERN-LHC

Description

An example workflow to produce datasets to be used to develop machine learning algorithms for selection and filtering pixel doublet seeds in tracking applications with CMS 2018 simulated data. The code can be run inside the CMS Open Data environment

One of the first steps of the track finding workflow is the creation of track seeds, i.e. compatible pairs of hits from different detector layers, that are subsequently fed to to higher level pattern recognition steps. However the set of compatible hit pairs is highly affected by combinatorial background resulting in the next steps of the tracking algorithm to process a significant fraction of fake doublets. For each event an $O(10^6)$ doublets are produced while only an $O(10^3)$ are genuine resulting in a fake ratio of $O(10^3)$. A possible way of reducing this effect is using Machine Learning and Deep Learning techniques to check the compatibility between two hits. Indeed, the task of fake rejection can be seen as a typical classification problem for which networks and MVA methods have been widely proven to provide reliable results. The dataset provided is intended to be used to explore this techniques.

The workflow provided produces a dataset consisting of a collection of pixel doublet seeds, i.e. the hit pairs that could belong to the same particle. The compatibility between two hits is evaluated only on the basis of geometrical considerations, such as cuts in $\eta$, $\phi$ and $r$. These doublets define the building blocks for further tracks. Each doublet is characterised by a set of features, such as its coordinates and the charge released in the Pixel detector, and the pixel cluster shape, projected on 2D histogram.

Use with

Use this with:

/TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIIAutumn18DR-PUAvg50IdealConditions_IdealConditions_102X_upgrade2018_design_v9_ext1-v2/FEVTDEBUGHLT

Related items

The ouput dataset of this software tool is available in

Sample with tracker hit information for tracking algorithm ML studies TTbar_13TeV_PU50_PixelSeeds

Characteristics

1 file. 11.1 MiB in total.

System details

Use this code with the CMS Open Data VM environment
Software release: CMSSW_10_2_5

CMS VM Image, for 2011, 2012 and 2015 CMS open data

How can you use this?

If you do not have the CERN Virtual Machine for CMS open data installed, follow the instructions in step 1 at How to install a CERN Virtual Machine.

To run the analysis, follow the instructions in CNNPixelSeedsProducerTool documentation on Github.

Source code repository

https://github.com/cms-opendata-analyses/CNNPixelSeedsProducerTool

Files and indexes

Disclaimer

These open data are released under the GNU General Public License v3.0.

Neither the experiment(s) ( CMS ) nor CERN endorse any works, scientific or otherwise, produced using these data.

This release has a unique DOI that you are requested to cite in any applications or publications.