Inverted Dropout

This post is a lecture summary of the deep learning course by Andrew N. G, available at https://www.coursera.org/learn/deep-neural-network/home/welcome.

During Training:

  • Neurons are dropped out by setting them to zero.
  • The activation is adjusted by dividing it with the keep probability.
  • The expected value of z[4] (as shown in the screenshot below) should not be altered.
inverted_dropout

During Scoring:

  • If “inverted dropout” is used, no additional steps are necessary.
  • Other dropout techniques may require some computations.

Intuition:

  • Dropping out neurons causes inputs to the unit to be randomly dropped.
  • This prevents the unit from relying too heavily on a single feature and encourages it to distribute weights across multiple features.
  • Different layers can have different keep probabilities.

Side Effect:

  • The cost function is not well defined.
  • It’s not possible to check if the cost is consistently decreasing every iteration.
  • A debugging tool is used to address this issue.

Solution:

  • First, verify that everything is functioning correctly without dropout.
  • Then, gradually introduce dropout.

———————————————————————–

Other Regularization Techniques

  • Data augmentation, such as horizontal flipping, random cropping, and transformations.
  • Early stopping: Stop training at a certain iteration (e.g., 7k instead of 10k) based on the error observed on the development set.

Downside

  • Balancing optimization and avoiding overfitting can be challenging.
  • Mixing both objectives requires careful consideration.

Advantage

  • Unlike L2 regularization, dropout does not necessitate trying different lambda values repeatedly.