We’ll break our training up into multiple steps, and use different learning rates at each step. This will allow the model to train more quickly at the beginning by taking larger steps, but we will reduce the learning rate in later steps, in order to more finely tune the model as it approaches an optimal solution. If we just used a high learning rate during the entire training process, then the network may never converge on a good solution, and if we use a low learning rate for the entire process, then the network would take far too long to train. Varying the learning rate gives us the best of both worlds (high accuracy, with a fast training time).
Instructor: [00:00] We're setting the learning rate for the Adam optimizer before we fit, but we may want to change that later and retrain with a lower learning rate.
[00:09] After we fit the first time, we can change the model optimizer by setting model.optimizer to a new Adam optimizer with a lower learning rate. Then we can call fit again with the same parameters as before.
[00:22] It's perfectly OK to call fit more than once on your model. It will remember the weights from before and continue to train and improve on them during the second fit step.
[00:30] What we're doing by first writing with a high learning rate and then switching to a small learning rate is telling the network that it can start by taking large steps, which gives it more freedom to explore the training landscape.
[00:44] Then when we want to start refining the results, without risking taking a big step in the wrong direction, we lower the learning rate and continue training.
[00:54] Now when we run that it starts with a hundred epochs at the first learning rate and then continues with another hundred epochs at the smaller learning rate.
[01:05] Now that we're happy with that model, let's save it, so that we can reload the fully-trained model later.