photo credit: Andrej Karpathy's Twitter page

Andrej Karpathy is the Director of AI at Tesla and is currently focused on perception for the Autopilot. Previously, he was a Research Scientist at OpenAI working on Deep Learning in Computer Vision, Generative Modeling and Reinforcement Learning. He received his PhD from Stanford, where he worked with Fei-Fei Li on Convolutional/Recurrent Neural Network architectures and their applications in Computer Vision, Natural Language Processing and their intersection. Over the course of his PhD he squeezed in two internships at Google where he worked on large-scale feature learning over YouTube videos, and in 2015 he interned at DeepMind and worked on Deep Reinforcement Learning. Together with Fei-Fei, he designed and taught a new Stanford class on Convolutional Neural Networks for Visual Recognition (CS231n). The class was the first Deep Learning course offering at Stanford and has grown from 150 enrolled in 2015 to 330 students in 2016, and 750 students in 2017.

The Six Mistakes

  • You didn't try to overfit a single batch first.
  • You forgot to toggle train/eval mode for the net.
  • You forgot to .zero_grad() (in pytorch) before .backward().
  • You passed 'softmaxed' outputs to a loss that expects raw logits.
  • You didn't use bias=False for your Linear/Conv2d layer when using BatchNorm, or conversely forget to include it for the output layer. (This one won't make you silently fail, but they are spurious parameters)
  • Thinking view() and permute() are the same thing (& incorrectly using view)

Photo credit: Andrej Karpathy's Twitter profile picture

Leave a Reply

Your email address will not be published. Required fields are marked *