Learning to discover: signal/background separation and the Higgs boson

Balázs Kégel

Lab. de l'Accélérateur Linéaire, Univ. d'Orsay

Tri-séminaire de la physique statistique (ex PHYSTAT-SUD)

Mon, May. 04th 2015, 14:30

Salle Claude Itzykson, Bât. 774, Orme des Merisiers

Classification algorithms have been routinely used since the 90s in high-energy physics to separate signal and background in particle detectors. The goal of the classifier is to maximize the sensitivity of a counting test in a selection region. It is similar in spirit but formally different from the classical objectives of minimizing misclassification error or maximizing AUC. We start the talk by motivating the problem on an ongoing example of detecting the Higgs boson in the tau-tau decay channel in the ATLAS detector of the LHC. We formalize the problem, then go on by describing the usual analysis chain, and explain some of the choices physicists make when designing a classifier for optimizing the discovery significance. We derive different surrogates that capture this goal and show some simple techniques to optimize them, raising some questions both on the statistical and on the algorithmic side. We end the talk by presenting a data challenge we organized to draw the attention of the machine learning and statistics communities to this important application and to improve the techniques used to optimize the discovery significance. With a PhD in computer science, Balázs Kégl has been a researcher in the Linear Accelerator Laboratory of the CNRS and the chair of the Center for Data Science of the Université Paris-Saclay since 2014. He has published more than hundred papers on unsupervised and supervised learning, large-scale Bayesian inference and optimization, and on various applications. At his current position he has been the head of the AppStat team working on machine learning and statistical inference problems motivated by applications in high-energy particle and astroparticle physics.

Contact : lbervas