GAMM2025

Name: GAMM2025
Start: 2025-04-07T08:00:00+02:00
End: 2025-04-11T22:00:00+02:00
Location: Lecture and Conference Centre

7–11 Apr 2025

Lecture and Conference Centre

Europe/Warsaw timezone

Multilevel and parallel approaches to enhance the training of Transformers

8 Apr 2025, 15:30

30m

Room 1

MS01: Hybrid algorithms by combining machine learning with multilevel and domain decomposition methods

Marc Salvadó Benasco

Over the last years, Transformer-based models have achieved cutting-edge results in areas such as natural language processing, computer vision, multimodality, and robotics due to the parallelization of their attention mechanism and its direct access to distant tokens in the sentence. Nonetheless, such parallelization can only be carried out along the sentence length, not the number of layers (i.e., depth). Despite yielding an amazing performance, the rising scaling of Transformers' depth and dimension entails a high computational cost. By formulating the forward and backward propagations of the Transformer as ODEs, we explore parallel-in-time and multilevel methods to mitigate the computational cost caused by a large depth. We present numerical experiments from the field of large-language modeling that demonstrate the effectiveness of the proposed training strategies.

Marshall Jiang Marc Salvadó Benasco Alena Kopanicakova Rolf Krause Jacob Schroder Eric Cyr

There are no materials yet.

GAMM2025

Multilevel and parallel approaches to enhance the training of Transformers

Room 1

Speaker

Description

Co-authors

Presentation materials

Choose timezone

GAMM2025

Speaker

Description

Co-authors

Presentation materials