4. Neural Methods

Transformers — Quiz

Test your understanding of transformers with 5 practice questions.

Practice Questions

Question 1

Which of the following best describes the primary innovation of the Transformer architecture in comparison to recurrent neural networks (RNNs) for sequence processing?

Question 2

In a Transformer's self-attention mechanism, if the input sequence has a length of $L$ and the dimension of the keys ($d_k$) is 64, what is the dimension of the attention scores matrix before the softmax function?

Question 3

Consider a scenario where a Transformer model is translating a sentence from English to French. Which part of the Transformer architecture is primarily responsible for generating the French output word by word, taking into account both the English input and the previously generated French words?

Question 4

Which of the following is the primary reason for using 'Layer Normalization' instead of 'Batch Normalization' in Transformer models?

Question 5

How does the 'Add & Normalize' step contribute to the training stability and performance of a Transformer model?