This post is a lecture summary of the deep learning course by Andrew N. G, available at https://www.coursera.org/learn/deep-neural-network/home/welcome.
During Training:
- Neurons are dropped out by setting them to zero.
- The activation is adjusted by dividing it with the keep probability.
- The expected value of z[4] (as shown in the screenshot below) should not be altered.

During Scoring:
- If “inverted dropout” is used, no additional steps are necessary.
- Other dropout techniques may require some computations.
Intuition:
- Dropping out neurons causes inputs to the unit to be randomly dropped.
- This prevents the unit from relying too heavily on a single feature and encourages it to distribute weights across multiple features.
- Different layers can have different keep probabilities.
Side Effect:
- The cost function is not well defined.
- It’s not possible to check if the cost is consistently decreasing every iteration.
- A debugging tool is used to address this issue.
Solution:
- First, verify that everything is functioning correctly without dropout.
- Then, gradually introduce dropout.
———————————————————————–
Other Regularization Techniques
- Data augmentation, such as horizontal flipping, random cropping, and transformations.
- Early stopping: Stop training at a certain iteration (e.g., 7k instead of 10k) based on the error observed on the development set.
Downside
- Balancing optimization and avoiding overfitting can be challenging.
- Mixing both objectives requires careful consideration.
Advantage
- Unlike L2 regularization, dropout does not necessitate trying different lambda values repeatedly.

