Mapping Transformer Models on Systolic Array Architectures
Master’s Thesis / Student Research Project
Abstract
Transformer models such as large language models or generative AI heavily rely on the attention layer. Due to the different mapping parameters for AI hardware accelerators, finding the best parameter set for an efficient execution remains a significant challenge, especially when only an AI hardware accelerator prototype is available. Therefore, mapping parameter-agnostic performance estimators for AI hardware accelerators are needed.
The Abstract Computer Architecture Description Language (ACADL) allows for modeling arbitrary AI hardware accelerator architectures and subsequent performance evaluation of DNNs mapped onto the modeled computer architectures. Goal of this thesis project is to map the attention layer onto a systolic array architecture for the purpose of performance evaluation.
References
- Mika Markus Müller, Alexander Richard Manfred Borst, Konstantin Lübeck, Alexander Louis-Ferdinand Jung, Felix Wedlich, Oliver Bringmann - Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators
- Konstantin Lübeck, et al. - Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators
Requirements
- Python
- PyTorch
- C/C++
- Successfully atteded the lecture “Grundlagen der Rechnerarchitektur” and/or “Parallele Rechnerarchitekturen” (optional)
- Linux (optional)