4. Deep Learning
Sequence Models — Quiz
Test your understanding of sequence models with 5 practice questions.
Practice Questions
Question 1
In a Transformer model, the self-attention mechanism computes three main vectors for each token: Query ($Q$), Key ($K$), and Value ($V$). If the input embedding for a token is $x$ and the weight matrices for these transformations are $W_Q$, $W_K$, and $W_V$ respectively, which of the following correctly represents how the Query vector is derived?
Question 2
Consider a scenario where a sequence model is used for real-time speech recognition. The model needs to process audio input continuously and generate text output with minimal latency. Which of the following architectural choices would be most suitable for this application, prioritizing efficiency and the ability to handle long, continuous sequences without excessive computational cost?
Question 3
In the context of sequence models, the 'vanishing gradient problem' is a significant challenge during the training of Recurrent Neural Networks (RNNs). Which of the following mathematical expressions best illustrates the core issue of vanishing gradients in a simplified RNN backpropagation scenario, where the gradient of the loss ($L$) with respect to an earlier weight ($W$) diminishes rapidly over time steps ($t$)?
Question 4
A financial analyst is building a model to predict stock prices based on historical data, including daily opening price, closing price, trading volume, and relevant news headlines. The model needs to capture complex, non-linear dependencies over long periods and integrate both numerical and textual information. Which of the following sequence model architectures would be most effective for this task?
Question 5
In a Long Short-Term Memory (LSTM) network, the cell state ($C_t$) is crucial for retaining long-term dependencies. The update to the cell state involves a forget gate ($f_t$), an input gate ($i_t$), and a candidate cell state ($\tilde{C}_t$). Which of the following equations correctly describes how the new cell state ($C_t$) is computed?
