Transfer learning: How to build accurate models

Friday, August 10, 2018
6 mins read

A good Covolutional Neural Network model requires a large dataset and good amount of training, which is often not possible in practice. Transfer learning provides a turn around it. It’s a method to use pre-trained models to obtain better results. A pre-trained model has been previously trained on a dataset and contains the weights and biases that represent the features of whichever dataset it was trained on. There are two ways to achieve this:

  • Extract features from the pre-trained model and use them in your model
  • Fine-tune the pre-trained ConvNet model

The following table summarizes the method to be adopted according to your dataset properties:

Size of dataset   Compared to original dataset   Method
small   similar   train a linear classifier on CNN nodes
small   different   train classifier from activations somewhere earlier in the network
large   similar   fine-tune the model
large   different   can build model from scratch, initialize weight from pre-trained model

Case I: Small dataset, similar data

  • slice off the end of the neural network only
  • add a new fully connected layer that matches the number of classes in the new data set
  • randomize the weights of the new fully connected layer; freeze all the weights from the pre-trained network (so that the network behaves as fixed feature-extractor)
  • train the network to update the weights of the new fully connected layer

Case II: Small dataset, Different data

  • slice off all but some of the pre-trained layers near the beginning of the network
  • add to the remaining pre-trained layers a new fully connected layer that matches the number of classes in the new data set
  • randomize the weights of the new fully connected layer; freeze all the weights from the pre-trained network
  • train the network to update the weights of the new fully connected layer

Case III: Large dataset, Similar data

  • remove the last fully connected layer and replace with a layer matching the number of classes in the new data set
  • randomly initialize the weights in the new fully connected layer
  • initialize the rest of the weights using the pre-trained weights
  • re-train the entire neural network

Case IV: Large dataset, Different data

  • remove the last fully connected layer and replace with a layer matching the number of classes in the new data set
  • retrain the network from scratch with randomly initialized weights
  • alternatively, you could just use the same strategy as the “large and similar” data case

The following guide used ResNet501 as pre-trained model and uses it as feature extractor for building a ConvNet for CIFAR102 dataset. The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

import numpy as np
import matplotlib.pyplot as plt
from keras.datasets import cifar10
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D
from keras.layers import Dropout, Flatten, GlobalAveragePooling2D
from keras.applications.resnet50 import ResNet50, preprocess_input

(X_train, y_train), (X_test, y_test) = cifar10.load_data() 
X_train.shape, X_test.shape, np.unique(y_train).shape[0]
# one-hot encoding
n_classes = 10
y_train = np_utils.to_categorical(y_train, n_classes)
y_test = np_utils.to_categorical(y_test, n_classes)

Now, extract features from ResNet50 and save them.

# load model
model_tl = ResNet50(weights='imagenet',
                    include_top=False,  # remove top FC layers
                   input_shape=(200, 200, 2))

# reshape as min size of image to fed into ResNet is (197, 197, 3)
X_train_new = np.array([imresize(X_train[i], (200, 200, 3)) for i in range(0, len(X_train))]).astype('float32')
# preprocess data 
resnet_train_input = preprocess_input(X_train_new)
# create bottleneck features for training data
train_features = model.predict(resnet_train_input)
# save the bottleneck features
np.savez('resnet_features_train', features=train_features)

# reshape testing data
X_test_new = np.array([imresize(X_test[i], (200, 200,  3)) for i in range(0, len(X_test))]).astype('float32')
# preprocess to fed it in pre-trained ResNet50
restnet_test_input = preprocess_input(X_test_new)
# extract features
test_featues = model.predict(restnet_test_input)
# save features
np.savez('resnet_features_test', features=test_featues)

Finally, build the model in Keras using the extracted features.

# create model
model = Sequential()
model.add(GlobalAveragePooling2D(input_shape=train_featues.shape[1:]))
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))
model.summary()

model.compile(loss='categorical_crossentropy',
             optimizer='adam',
             metrics=['accuracy'])

model.fit(train_features, y_train,
          batch_size=32, epochs=10,
         validation_split=0.2, callbacks=[checkpointer],
         verbose=True, shuffle=True)

# model evaluation
score = model.evaluate(test_features, y_test)
print('Accuracy on test set: {}'.format(score[1]))

The use of transfer learning is possible because the features that ConvNets learn in the first layers are independent of the dataset, so are often transferable to different dataset.

Update: Also, check implementation in PyTorch on GitHub at kHarshit/transfer-learning.

Footnotes:

1: ResNet-50
2: CIFAR10 dataset

References:

  1. Transfer Learning - CS231n
  2. How transferable are features in deep neural networks?
  3. Machine learning - Hackerearth
  4. PyTorch Deep Learning Nanodegree - Udacity

You May Also Like

comments powered by Disqus