Di Florio, Adriano ; Pantaleo, Felice ; Pierini, Maurizio
Cite as: Di Florio, Adriano; Pantaleo, Felice; Pierini, Maurizio; (2019). CNNPixelSeedsProducerTool - Example workflow using ML in track reconstruction with CMS 2018 simulated data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.T709.ZN5Z
Software Tool Datascience CMS CERN-LHC
An example workflow to produce datasets to be used to develop machine learning algorithms for selection and filtering pixel doublet seeds in tracking applications with CMS 2018 simulated data. The code can be run inside the CMS Open Data environment
One of the first steps of the track finding workflow is the creation of track seeds, i.e. compatible pairs of hits from different detector layers, that are subsequently fed to to higher level pattern recognition steps. However the set of compatible hit pairs is highly affected by combinatorial background resulting in the next steps of the tracking algorithm to process a significant fraction of fake doublets. For each event an $O(10^6)$ doublets are produced while only an $O(10^3)$ are genuine resulting in a fake ratio of $O(10^3)$. A possible way of reducing this effect is using Machine Learning and Deep Learning techniques to check the compatibility between two hits. Indeed, the task of fake rejection can be seen as a typical classification problem for which networks and MVA methods have been widely proven to provide reliable results. The dataset provided is intended to be used to explore this techniques.
The workflow provided produces a dataset consisting of a collection of pixel doublet seeds, i.e. the hit pairs that could belong to the same particle. The compatibility between two hits is evaluated only on the basis of geometrical considerations, such as cuts in $\eta$, $\phi$ and $r$. These doublets define the building blocks for further tracks. Each doublet is characterised by a set of features, such as its coordinates and the charge released in the Pixel detector, and the pixel cluster shape, projected on 2D histogram.
Use this with:
Sample with tracker hit information for tracking algorithm ML studies TTbar_13TeV_PU50_PixelSeeds
If you do not have the CERN Virtual Machine for CMS open data installed, follow the instructions in step 1 at How to install a CERN Virtual Machine.
To run the analysis, follow the instructions in CNNPixelSeedsProducerTool documentation on Github.
These open data are released under the GNU General Public License v3.0.
Neither the experiment(s) ( CMS ) nor CERN endorse any works, scientific or otherwise, produced using these data.
This release has a unique DOI that you are requested to cite in any applications or publications.