CV II Lecture 06

Behavior Continuum

Representing Video

3D convolutional Networks

  • 2D convolutions vs. 3D convolutions
  • 3D filters at the first layer.

2-Stream Network

  • Spatial Stream ConvNet
  • Tenporal Stream ConvNet

Recurrent Neural Network

forward function

\begin{equation} h_i = f(w_{hx}^T x_i + w_{hh}^Th_{i-1}) \end{equation}

\begin{equation} y_i = f(w_{yh}^T h_i) \end{equation}

Vanishing/Exploding Gradient

  • Forward pass: \begin{equation} z_i = f(w_x^T x_i + w^T z_{i-1}) \end{equation}

  • Gradients: \begin{equation} \frac{dL}{dw} = \frac{dL}{dz_{i+1}} \frac{dz_{i+1}}{dz_i} \frac{z_i}{dw} \end{equation}

    \begin{equation} \frac{dL}{dw} = \frac{dL}{dz_T} (\prod_{j=i}^{T-1}\frac{dz_{j+1}}{dz_j}) \frac{dz_i}{dw} \end{equation}

Long Short-Term Memory (LSTM)

colah’s blog

Future Generation

\begin{equation} x_t \rightarrow f(x_t; w) \rightarrow x_{t+1} \end{equation}

Minimize Euclidean distance:

\begin{equation} \min_w \sum_i || f(x_t^i;w) - x_{t+1}^i||_2^2 \end{equation}

Train

\begin{equation} \min_f \sum_i ||f(x_t^i) - g(x_{t+1}^i)||_2^2 \end{equation}

\begin{equation} \min_{f,\delta} \sum_i \sum_k^K \delta_k^i \cdot ||f_k(x_t^i) - g(x_{t+1}^i)||_2^2 \end{equation} s.t. \(\delta^i \in{0,1}^K\) and \(||\delta^i||_1 = 1 \ \forall_i\)

Expectation-Maximization

  1. E Step: Fill in missing variables and estimate latent variables
  2. M Step: Fit the model (with back-propagation in this case)
  3. Repeat

© 2022 Jiawei Lu. All rights reserved.