Inverted Dropout

January 11, 2019May 17, 2023Archit Vora Leave a comment

This post is a lecture summary of the deep learning course by Andrew N. G, available at https://www.coursera.org/learn/deep-neural-network/home/welcome.

During Training:

Neurons are dropped out by setting them to zero.
The activation is adjusted by dividing it with the keep probability.
The expected value of z[4] (as shown in the screenshot below) should not be altered.

inverted_dropout

During Scoring:

If “inverted dropout” is used, no additional steps are necessary.
Other dropout techniques may require some computations.

Intuition:

Dropping out neurons causes inputs to the unit to be randomly dropped.
This prevents the unit from relying too heavily on a single feature and encourages it to distribute weights across multiple features.
Different layers can have different keep probabilities.

Side Effect:

The cost function is not well defined.
It’s not possible to check if the cost is consistently decreasing every iteration.
A debugging tool is used to address this issue.

Solution:

First, verify that everything is functioning correctly without dropout.
Then, gradually introduce dropout.

———————————————————————–

Other Regularization Techniques

Data augmentation, such as horizontal flipping, random cropping, and transformations.
Early stopping: Stop training at a certain iteration (e.g., 7k instead of 10k) based on the error observed on the development set.

Downside

Balancing optimization and avoiding overfitting can be challenging.
Mixing both objectives requires careful consideration.

Advantage

Unlike L2 regularization, dropout does not necessitate trying different lambda values repeatedly.