PhD subjects

3 sujets IPhT

Dernière mise à jour : 16-01-2019


• Theoretical Physics

 

Statistical physics modelling of artificial neural networks

SL-DRF-19-0513

Research field : Theoretical Physics
Location :

Service de Physique Théorique

Saclay

Contact :

Lenka ZDEBOROVA

Pierfrancesco Urbani

Starting date : 01-10-2019

Contact :

Lenka ZDEBOROVA

CNRS - DSM - Institut de Physique Théorique

01 6908 8114

Thesis supervisor :

Pierfrancesco Urbani

CNRS -

33 1 69 08 79 28

Personal web page : https://www.ipht.fr/Phocea/Membres/Annuaire/index.php

Laboratory link : https://www.ipht.fr

There is a long history of statistical physics bringing ideas to machine learning. Some of the commonly

used terms such as Boltzmann machine or Gibbs sampling are witnesses of that. Notably the 80s-90s were

very fruitful period where research in statistical physics brought a range of theoretical results about mod-

els of neural networks, see e.g. [AGS85,GD88,EVB01]. Those results concentrate on probabilistic models

of data (both the data distribution and the map to labels is modelled) in a way complementary to the

mainstream learning theory. Nowadays, the wide use of deep learning brings up a range of open theoreti-

cal questions and challenges that will likely require a synergy of theoretical ideas from several areas in-

cluding theoretical physics. In terms of modelling artificial neural networks, existing physics literature

mostly considers fully connected feedforward neural networks for supervised learning, and restricted

Boltzmann machines for unsupervised learning, it models data as random iid vectors. LZ currently holds

an ERC Starting grant focusing on statistical physics study of fully connected feedforward neural net-

works and auto-encoders, and related theoretical and algorithmic questions.



This PhD project will apply the statistical physics analysis to two classes of neural networks that, as far as

we know, have not yet been studied within this framework (and are not part the above ERC project):

Convolutional neural networks (for supervised learning) and Generative Adversarial Learning (for unsu-

pervised learning). We will evaluate analytically the optimal performance of such networks in the mod-

elled situations and compare it to the performance of efficient algorithm (message passing, and gradient

based algorithms), studied analytically and numerically. The main goal is to understand the behaviour,

advantages and limitations of such networks and improve the algorithmic procedures used for training.

Convolutional neural networks (ConvNets) stand at the basis of majority of modern state-of-the-art im-

age processing systems. Compared to fully connected networks the hidden neurons in ConvNets are con-

nected to only a subset of variables in the previous layer (a so-called receptive field) and usually many of

the hidden neurons connected to different receptive fields share the corresponding vector of weights thus

leading to computational speed up. The ConvNets architectures are also chosen for their ability to impose

the symmetries (e.g. the translational one). The model for ConvNet we have in mind is related to the

committee machine. The two supervisors recently worked on the committee machine neural networks,

while LZ studied its fully connected version [AMB18], PU studied a tree version where weights of each

hidden neuron are independent [FHU18]. The tree committee machine with weights of each hidden neu-

ron being the same is a simple (possibly simplest) model for convolutional neural network previously

studied in theoretical statistics and computer science literature, see e.g. [DLT17]. This model has not been

analyzed yet using the statistical physics methods that can access a broad range of open questions. Its so-

lution should not be more complicated than what is already done in [AMB18, FHU18], thus as ideal sub-

ject for a PhD student. With the solution at hand, many questions about ConvNets, their performance and

learning with efficient algorithms, will become analytically accessible and will be investigated. Exten-

sions to models where the receptive fields overlap will be the next case to study.



Generative adversarial networks (GANs) [GPM14] are often cited as the most influential idea in deep

learning in the past five years. Their purpose is to generate data samples that look statistically indistin-

guishable from sample in the training set. The principle of GANs is very simple: A generator-neural-net-

work is being trained to minimise the accuracy of a discriminator-neural-networks, that is in turn trying

maximise the number of samples classified correctly as coming from the true data versus from the genera-

tor-neural-network. The min-max nature of the training problem, however, causes serious problems to the

training algorithms and it is not understood mathematically when such learning is reliable and leads to

convergence and when it does not. A very recent work [WHL18] proposed a very elegant simple model of

GANs and analyzed the behavior of online learning in the model. The goal of this thesis project is to ana-

lyze the batch learning in this model of GANs, the underlying algorithms, their convergence and proper-

ties. From the point of view of existing research the above model of GANs can be seen as a combination

of a low-rank matrix factorisation model, and a perceptron problems with structured disorder. Both these

models were studied extensively by both of the supervisors, e.g. [LKZ17, FPSUZ17], and we are positive

that their combination corresponding the the above model of GANs can also be analyzed in a closed

form.



The longer term goal of this project is to understand the underlying principles of why the current deep

learning methods work well, what are their limitations and how they can be further improved. The two

supervisors combine expertise of the powerful methodology coming from statistical physics of disordered

systems, that is applicable to study of high-dimensional non-convex problems, and experience in multi-

disciplinary applications of this methodology. The student shall be trained in this vibrant field having ap-

plications in modern data analysis and machine learning which should be a great asset for his/her future

career prospects.



References

[AGS85] Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Spin-glass models of neural networks. Phys. Rev. A, 32(2), 1007.

[AMB18] Aubin, B., Maillard, A., Barbier, J., Krzakala, F., Macris, N., & Zdeborová, L. (2018). The committee machine: Com-

putational to statistical gaps in learning a two-layers neural network. arXiv preprint arXiv:1806.05451.

[DLT17] Du, S.S., Lee, J.D., Tian, Y., Poczos, B., & Singh, A. (2017). Gradient Descent Learns One-hidden-layer CNN: Don’t be

Afraid of Spurious Local Minima. arXiv:1712.00779

[EVB01] Engel A., Van Den Broeck C. (2001), Statistical Mechanics of Learning, Cambridge.

[FPSUZ17] Franz, S., Parisi, G., Sevelev, M., Urbani, P. & Zamponi, F. (2017) Universality of the SAT-UNSAT (jamming)

threshold in non-convex continuous constraint satisfaction problems, SciPost Phys. 2, 019.

[FHU18] Franz, S., Hwang, S., & Urbani, P. (2018). Jamming in multilayer supervised learning models. ArXiv:1809.09945.

[GD88] Gardner, E., & Derrida, B. (1988). Optimal storage properties of neural network models. J. Phys. A: Math. and Gen.

[GPM14] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative

adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).

[LKZ17] Lesieur, T., Krzakala, F., & Zdeborová, L. (2017). Constrained low-rank matrix estimation: Phase transitions, approxi-

mate message passing and applications. J. Stat. Mech.: Th. and Exp. 073403

[WHL18] Wang, C., Hu, H., & Lu, Y. M. (2018). A Solvable High-Dimensional Model of GAN. Preprint arXiv:1805.08349.

Testing gravity with gravitational waves and the large-scale structure

SL-DRF-19-0509

Research field : Theoretical Physics
Location :

Service de Physique Théorique

Saclay

Contact :

Filippo VERNIZZI

Starting date : 01-10-2019

Contact :

Filippo VERNIZZI

CEA - DSM - Institut de Physique Théorique

01 6908 7212

Thesis supervisor :

Filippo VERNIZZI

CEA - DSM - Institut de Physique Théorique

01 6908 7212

Personal web page : https://www.ipht.fr/Phocea/Membres/Annuaire/index.php?uid=fvernizz

Laboratory link : https://www.ipht.fr

Our current modelling of the Universe is based on applying the theory

of General Relativity to a very large range of scale and curvature

regimes where it hasn’t been tested yet. This extrapolation comes at

the price of introducing two unknown components: dark energy and dark

matter. While the theoretical physicists try to understand these

phenomena in terms of alternative theories of gravity, the

astronomical community devotes a large effort to map the distribution

of matter and galaxies with unprecedented precision. The aim of these

cosmic surveys is to investigate the behaviour of gravity on very

large scales and establish important constraints on alternative

models.



Meanwhile, the last two years have witnessed the flourishing of

gravitational wave physics as a powerful probe to constrain theories

of gravity. For instance, modified gravity acts as a refractive medium

to gravitational waves and the simultaneous observation of

gravitational waves and gamma ray bursts from a neutron star merger

has set a vigorous cut on the parameter space of available

models. More generally, gravitational wave science is destined to

become of great relevance for astro and fundamental physics.



In this thesis we will develop theoretical tools to constrain

different gravitational scenarios with the use of both gravitational

wave physics and cosmological structure formation. We will study the

effect of modified gravity on gravitational waves, both at their

emission by compact binary systems and on their propagation. Moreover,

we will study structure formation in the linear and mildly nonlinear

regime to establish the impact of dark components. In both cases, we

will employ effective field theory methods inspired by

non-relativistic quantum field theory. The combination of these probes

will allow to place the best constraints on gravity on a wide range of

scales.



This work is essentially analytical. However, it may resort to

numerical calculations to obtain quantitative results.

de Sitter vacua in String Theory

SL-DRF-19-0510

Research field : Theoretical Physics
Location :

Service de Physique Théorique

Saclay

Contact :

Iosif BENA

Starting date : 01-10-2019

Contact :

Iosif BENA

CEA - DSM - Institut de Physique Théorique

01 6908 7468

Thesis supervisor :

Iosif BENA

CEA - DSM - Institut de Physique Théorique

01 6908 7468

Personal web page : https://www.ipht.fr/Phocea/Membres/Annuaire/index.php?uid=ibena

Laboratory link : https://www.ipht.fr

String Theory is the most promising candidate for a theory that

unifies all the forces that exist in nature, and could therefore

provide a framework from which one may hope to derive all the observed

physical laws. However, String Theory lives in ten dimensions, and to

obtain real-world physics one needs to compactify it on certain

six-dimensional compact spaces that have a size much smaller than any

scale accessible to observations. Since there exist a large number of

such spaces it has been argued that there exist of order 10^{500}

four-dimensional String Theory vacua. These vacua have all possible

physical laws with all possible constants. This has led to a radically

new view of the physics in which one argues that the constants in the

physical laws that we measure in our Universe do not come from an

underlying unified theory, but are environmental,

anthropically-constrained variables that are determined by where we

are in this Multiverse.



Despite the fact that it goes agains the reductionist paradigm that

has driven scientific progress over the past century, the anthropic

explanation is rapidly becoming the favored explanation to the

extremely difficult task of explaining the enormous amount of fine

tuning present in the physical laws. First, the observed accelerated

expansion of the Universe is driven by a mysterious form of energy

density with negative pressure, whose value is 120 orders of magnitude

smaller than expected in particle physics (this was called “the worst

theoretical prediction in the history of physics”). Next comes the

hierarchy problem: the 24 orders of magnitude between the electroweak

energy scale and the gravity scale. Supersymmetry was thought for many

years to provide a beautiful solution to this problem, but the absence

of any LHC signal supporting supersymmetry, after scanning most of the

available phase space, is driving many towards anthropic/multiverse

explanations. Finally, models of cosmological inflation require

considerable fine-tuning when trying to achieve the almost flatness of

the inflationary potential and to meet the upper bound from

2015-Planck results on tensor-to-scalar ratio in the spectrum of

perturbations.



The multiverse paradigm provides a framework where none of these

fine-tunings requires an explanation, and string theory, with its

believed “landscape” of de Sitter vacua, appears to support

it. However, solutions in the landscape are not constructed directly

in ten dimensional string theory, but are found using effective

low-energy descriptions in four space-time dimensions. In order to

satisfy all experimental constraints, the effective theories require a

number of intricate ingredients such as anti-branes, T-branes or

nongeometric fluxes, whose string theory origin and consistency is

unclear. The purpose of this thesis is to examine whether a very large

number of these vacua are in fact unstable or inconsistent with the

experimental data coming out of the Large Hadron Collider. This will

be done by analyzing one of the key ingredients of the Multiverse

construction - the uplifting of the cosmological constant, and taking

into account the embeddings of Standard Model physics in String

Theory.



In parallel to this to-down endeavor to understand de Sitter vacua in

string theory, over the past year there has been an explosion of

interest in this question from the bottom-up perspective: the recent

de Sitter Swampland conjecture, (by Vafa and collaborators

arXiv:1806.08362) proposes that, in the regime of parameters where

calculations can be trusted, all solutions with a positive

cosmological constant are either unstable or have a runaway

behavior. We will attempt to link the top-down results we will obtain

with this line of work, hoping to establish that String Theory does

not support the multiverse paradigm. Applicants are expected to have a

solid background in general relativity and quantum field theory.





Understanding the origin of the laws governing our universe is an

endeavor in which the DRF is positioned very well worldwide, both

theoretically and experimentally: from the LHC group at the IRFU/DPP

that puts stronger and stronger bounds on beyond-the-standard-model

physics, to the Planck group at the IRFU/Département d'Astrophysique

that tries to measure the cosmological parameters of our Universe

using the cosmic microwave background, to the particle theory group at

the IPhT that explores extensions of the Standard Model and the fine

tuning thereof. Moreover, there exists an ongoing line of research to

count and investigate the vacua of string theory using Machine

Learning. One of the talks at the “Séminaire Intelligence Artificielle

et Physique Théorique” organized by DRT/LIST last June was on this

topic.

 

Retour en haut