The signal sample contains events in which Higgs bosons (with a fixed mass of 125 GeV) were produced. The background sample was generated by other known processes that can produce events with at least one electron or muon and a hadronic tau, mimicking the signal. For the sake of simplicity, only three background processes were retained for the Challenge. The first comes from the decay of the Z boson (with a mass of 91.2 GeV) into two taus. This decay produces events with a topology very similar to that produced by the decay of a Higgs. The second set contains events with a pair of top quarks, which can have a lepton and a hadronic tau among their decay. The third set involves the decay of the W boson, where one electron or muon and a hadronic tau can appear simultaneously only through imperfections of the particle identification procedure.
Due to the complexity of the simulation process, each simulated event has a weight that is proportional to the conditional density divided by the instrumental density used by the simulator (an importancesampling flavour), and normalised for integrated luminosity such that, in any region, the sum of the weights of events falling in the region is an unbiased estimate of the expected number of events falling in the same region during a given fixed time interval. In our case, the weights correspond to the quantity of real data taken during the year 2012. The weights are an artifact of the way the simulation works and so they are not part of the input to the classifier. For the Challenge, weights have been provided in the training set so the AMS can be properly evaluated. Weights were not provided in the qualifying set since the weight distribution of the signal and background sets are very different and so they would give away the label immediately. However, in the opendata.cern.ch dataset, weights and labels have been provided for the complete dataset.

 atlashiggschallenge2014v2.csv.gz
 Size: 65.6 MB

EventId: An unique integer identifier of the event.
DER_mass_MMC: The estimated mass $m_{H}$ of the Higgs boson candidate, obtained through a probabilistic phase space integration.
DER_mass_transverse_met_lep: The transverse mass between the missing transverse energy and the lepton.
DER_mass_vis: The invariant mass of the hadronic tau and the lepton.
DER_pt_h: The modulus of the vector sum of the transverse momentum of the hadronic tau, the lepton and the missing transverse energy vector.
DER_deltaeta_jet_jet: The absolute value of the pseudorapidity separation between the two jets (undefined if PRI_jet_num $\leq$ 1).
DER_mass_jet_jet: The invariant mass of the two jets (undefined if PRI_jet_num $\leq$ 1).
DER_prodeta_jet_jet: The product of the pseudorapidities of the two jets (undefined if PRI_jet_num $\leq$ 1).
DER_deltar_tau_lep: The R separation between the hadronic tau and the lepton.
DER_pt_tot: The modulus of the vector sum of the missing transverse momenta and the transverse momenta of the hadronic tau, the lepton, the leading jet (if PRI_jet_num $\geq$) and the subleading jet (if PRI jet num = 2) (but not of any additional jets).
DER_sum_pt: The sum of the moduli of the transverse momenta of the hadronic tau, the lepton, the leading jet (if PRI jet num $\geq$ 1) and the subleading jet (if PRI jet num = 2) and the other jets (if PRI jet num = 3).
DER_pt_ratio_lep_tau: The ratio of the transverse momenta of the lepton and the hadronic tau.
DER_met_phi_centrality: The centrality of the azimuthal angle of the missing transverse energy vector w.r.t. the hadronic tau and the lepton.
DER_lep_eta_centrality: The centrality of the pseudorapidity of the lepton w.r.t. the two jets (undefined if PRI_jet_num $\leq$ 1).
PRI_tau_pt: The transverse momentum $\sqrt{p^{2}_{x} + p^{2}_{y}}$ of the hadronic tau.
PRI_tau_eta: The pseudorapidity $\eta$ of the hadronic tau.
PRI_tau_phi: The azimuth angle $\phi$ of the hadronic tau.
PRI_lep_pt: The transverse momentum $\sqrt{p^{2}_{x} + p^{2}_{y}}$ of the lepton (electron or muon).
PRI_lep_eta: The pseudorapidity $\eta$ of the lepton.
PRI_lep_phi: The azimuth angle $\phi$ of the lepton.
PRI_met: The missing transverse energy $\overrightarrow{E}^{miss}_{T}$
PRI_met_phi: The azimuth angle $\phi$ of the mssing transverse energy
PRI_met_sumet: The total transverse energy in the detector.
PRI_jet_num: The number of jets (integer with value of 0, 1, 2 or 3; possible larger values have been capped at 3).
PRI_jet_leading_pt: The transverse momentum $\sqrt{p^{2}_{x} + p^{2}_{y}}$ of the leading jet, that is the jet with largest transverse momentum (undefined if PRI_jet_num = 0).
PRI_jet_leading_eta: The pseudorapidity $\eta$ of the leading jet (undefined if PRI jet num = 0).
PRI_jet_leading_phi: The azimuth angle $\phi$ of the leading jet (undefined if PRI jet num = 0).
PRI_jet_subleading_pt: The transverse momentum $\sqrt{p^{2}_{x} + p^{2}_{y}}$ of the leading jet, that is, the jet with second largest transverse momentum (undefined if PRI_jet_num $\leq$ 1).
PRI_jet_subleading_eta: The pseudorapidity $\eta$ of the subleading jet (undefined if PRI_jet_num $\leq$ 1).
PRI_jet_subleading_phi: The azimuth angle $\phi$ of the subleading jet (undefined if PRI_jet_num $\leq$ 1).
PRI_jet_all_pt: The scalar sum of the transverse momentum of all the jets of the events.
Weight: The event weight $w_{i}$
Label: The event label (string) $y_{i}$ $\in$ $\{s,b\}$ (s for signal, b for background).
KaggleSet: String specifying to which Kaggle set the event belongs : ”t”:training, ”b”:public leaderboard, ”v”:private leaderboard,”u”:unused.
KaggleWeight: Weight normalised within each Kaggle dataset.
All releases will have a unique DOI that you are requested to cite in any applications or publications.
The evaluation metric is the approximate median significance (AMS):
\[ \text{AMS} = \sqrt{2\left((s+b+b_r) \log \left(1 + \frac{s}{b + b_r}\right)s\right)}\]
where
More precisely, let $(y_1, \ldots, y_n) \in \{\text{b},\text{s}\}^n$ be the vector of true test labels, let $(\hat{y}_1, \ldots, \hat{y}_n) \in \{\text{b},\text{s}\}^n$ be the vector of predicted (submitted) test labels, and let $(w_1, \ldots, w_n) \in {\mathbb{R}^+}^n$ be the vector of weights. Then
\[ s = \sum_{i=1}^n w_i\mathbb{1}\{y_i = \text{s}\} \mathbb{1}\{\hat{y}_i = \text{s}\} \]
and
\[ b = \sum_{i=1}^n w_i\mathbb{1}\{y_i = \text{b}\} \mathbb{1}\{\hat{y}_i = \text{s}\}, \]
where the indicator function $\mathbb{1}\{A\}$ is 1 if its argument $A$ is true and 0 otherwise.
For more information on the statistical model and the derivation of the metric, see the documentation.