CV II Lecture 06
Behavior Continuum
Representing Video
3D convolutional Networks
- 2D convolutions vs. 3D convolutions
- 3D filters at the first layer.
2-Stream Network
- Spatial Stream ConvNet
- Tenporal Stream ConvNet
Recurrent Neural Network
forward function
\begin{equation} h_i = f(w_{hx}^T x_i + w_{hh}^Th_{i-1}) \end{equation}
\begin{equation} y_i = f(w_{yh}^T h_i) \end{equation}
Vanishing/Exploding Gradient
Forward pass: \begin{equation} z_i = f(w_x^T x_i + w^T z_{i-1}) \end{equation}
Gradients: \begin{equation} \frac{dL}{dw} = \frac{dL}{dz_{i+1}} \frac{dz_{i+1}}{dz_i} \frac{z_i}{dw} \end{equation}
\begin{equation} \frac{dL}{dw} = \frac{dL}{dz_T} (\prod_{j=i}^{T-1}\frac{dz_{j+1}}{dz_j}) \frac{dz_i}{dw} \end{equation}
Long Short-Term Memory (LSTM)
Future Generation
\begin{equation} x_t \rightarrow f(x_t; w) \rightarrow x_{t+1} \end{equation}
Minimize Euclidean distance:
\begin{equation} \min_w \sum_i || f(x_t^i;w) - x_{t+1}^i||_2^2 \end{equation}
Train
\begin{equation} \min_f \sum_i ||f(x_t^i) - g(x_{t+1}^i)||_2^2 \end{equation}
\begin{equation} \min_{f,\delta} \sum_i \sum_k^K \delta_k^i \cdot ||f_k(x_t^i) - g(x_{t+1}^i)||_2^2 \end{equation} s.t. \(\delta^i \in{0,1}^K\) and \(||\delta^i||_1 = 1 \ \forall_i\)
Expectation-Maximization
- E Step: Fill in missing variables and estimate latent variables
- M Step: Fit the model (with back-propagation in this case)
- Repeat