7–11 Apr 2025
Lecture and Conference Centre
Europe/Warsaw timezone

Convergence and Implicit Bias: Analyzing Diagonal Linear Networks with Gradient Descent

Speaker

Wiebke Bartolomaeus

Description

In deep learning, one often operates in a (highly) over parametrized regime. Meaning we have significantly more trainable parameters than available training data. Nevertheless, experiments show that the generalization error after training with (stochastic) gradient descent is still small, while one would expect over fitting, i.e. small training error and relatively large test error.

This suggests the existence of an implicit bias towards learning networks that generalize well, in settings where infinitely many networks can achieve zero training loss.

To investigate this phenomenon, we analyze the training dynamics of deep diagonal linear networks. Alternatively, this can be interpreted from the perspective of recovering sparse signals from linear measurements.

We propose a method to show convergence of the gradient descent and to fully characterize its limit. Using techniques inspired by Mirror Gradient Descent and a Lojasiewicz type of inequality.

Co-authors

Presentation materials

There are no materials yet.