A good Covolutional Neural Network model requires a large dataset and good amount of training, which is often not possible in practice. Transfer learning provides a turn around it. It’s a method to use pre-trained models to obtain better results. A pre-trained model has been previously trained on a dataset and contains the weights and biases that represent the features of whichever dataset it was trained on. There are two ways to achieve this:
Extract features from the pre-trained model and use them in your model
Fine-tune the pre-trained ConvNet model
The following table summarizes the method to be adopted according to your dataset properties:
Size of dataset
Compared to original dataset
Method
small
similar
train a linear classifier on CNN nodes
small
different
train classifier from activations somewhere earlier in the network
large
similar
fine-tune the model
large
different
can build model from scratch, initialize weight from pre-trained model
Case I: Small dataset, similar data
slice off the end of the neural network only
add a new fully connected layer that matches the number of classes in the new data set
randomize the weights of the new fully connected layer; freeze all the weights from the pre-trained network (so that the network behaves as fixed feature-extractor)
train the network to update the weights of the new fully connected layer
Case II: Small dataset, Different data
slice off all but some of the pre-trained layers near the beginning of the network
add to the remaining pre-trained layers a new fully connected layer that matches the number of classes in the new data set
randomize the weights of the new fully connected layer; freeze all the weights from the pre-trained network
train the network to update the weights of the new fully connected layer
Case III: Large dataset, Similar data
remove the last fully connected layer and replace with a layer matching the number of classes in the new data set
randomly initialize the weights in the new fully connected layer
initialize the rest of the weights using the pre-trained weights
re-train the entire neural network
Case IV: Large dataset, Different data
remove the last fully connected layer and replace with a layer matching the number of classes in the new data set
retrain the network from scratch with randomly initialized weights
alternatively, you could just use the same strategy as the “large and similar” data case
The following guide used ResNet501 as pre-trained model and uses it as feature extractor for building a ConvNet for CIFAR102 dataset. The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
Now, extract features from ResNet50 and save them.
Finally, build the model in Keras using the extracted features.
The use of transfer learning is possible because the features that ConvNets learn in the first layers are independent of the dataset, so are often transferable to different dataset.