Generalization and overfitting in overparametrized two-layer neural networks

Generalization and overfitting in overparametrized two-layer neural networks

Pierfrancesco Urbani, IPhT

Understanding the generalization properties of large, overparametrized, neural networks is a central problem in theoretical machine learning. Several insightful ideas have been proposed in this regard, among them: the implicit regularization hypothesis, the possibility of having benign overfitting and the existence of feature learning regimes where neural networks learn the latent structure of data. However a precise understanding of the emergence/validity of these behaviors cannot be disentangled from the study of the non-linear training dynamics. We use a technique from statistical physics, dynamical mean field theory, to study the training dynamics and obtain a rich picture of how generalization and overfitting arise in large overparametrized models. In particular we point out: (i) the emergence of a separation of timescales controlling feature learning and overfitting, (ii) a non-monotone behavior of the test error and, correspondingly, a ‘feature unlearning’ phase at large times and (iii) the emergence of algorithmic inductive bias towards small complexity. Joint work with Andrea Montanari.

L'événement est terminé.

Date

24 mars 2025
Expiré!

Heure

11h00 – 12h00

Lieu

Nom du Lieu
QR Code