ANN - Computer Vision

We will use ANNs for a basic computer vision application of image classification on CIFAR10 Dataset

Dataset

CIFAR-10

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

Download the Dataset

Tensorflow has inbuilt dataset which makes it easy to get training and testing data.

from tensorflow.keras.datasets import cifar10
import tensorflow as tf

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
class_name = {
    0: 'airplane',
    1: 'automobile',
    2: 'bird',
    3: 'cat',
    4: 'deer',
    5: 'dog',
    6: 'frog',
    7: 'horse',
    8: 'ship',
    9: 'truck',
}

Visualize the Dataset

import matplotlib.pyplot as plt

num_imgs = 10
plt.figure(figsize=(num_imgs*2,3))

for i in range(1,num_imgs):
    plt.subplot(1,num_imgs,i).set_title('{}'.format(class_name[y_train[i][0]]))
    plt.imshow(x_train[i])
    plt.axis('off')
plt.show()

png

Scaling Features

import numpy as np

np.max(x_train), np.min(x_train)
(255, 0)

The range of values in the data is 0-255,

  • we can scale it to [0,1], dividing it by 255 will do.
  • or we can standardize it by subtracting the mean and dividing std.
mean = np.mean(x_train)
std = np.std(x_train)

x_train = (x_train-mean)/std
x_test = (x_test-mean)/std

np.max(x_train), np.min(x_train), np.max(x_test), np.min(x_test)
(2.09341038199596, -1.8816433721538972, 2.09341038199596, -1.8816433721538972)

Labels to One-Hot

print(y_train[:5])
[[6]
 [9]
 [9]
 [4]
 [1]]
num_classes = 10

y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

print(y_train[:5])
[[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]]

Colour Channels

Let’s check the shape of the arrays

x_train.shape, y_train.shape, x_test.shape, y_test.shape
((50000, 32, 32, 3), (50000, 10), (10000, 32, 32, 3), (10000, 10))

The shape of each image is (32 x 32 x 3), previously we saw MNIST dataset had a shape of (28 x 28), why is it so?

MNIST is a gray-scale image, it has a single channel of gray scale. But CIFAR-10 is a colour image, every colour pixel has 3 channels RGB, all these 3 channel contribute to what colour you see.

Any colour image is made of 3 channels RGB.

Visualize colour channels

import matplotlib.pyplot as plt

plt.figure(figsize=(9,3))
plt.subplot(1,3,1)
plt.imshow(x_train[1][:,:,0], cmap='Reds')
plt.subplot(1,3,2)
plt.imshow(x_train[1][:,:,1], cmap='Greens')
plt.subplot(1,3,3)
plt.imshow(x_train[1][:,:,2], cmap='Blues')
plt.show()

png

So when we flatten the CIFAR-10 image, it will give 32x32x3 = 3072.

Model

import tensorflow as tf
from tensorflow import keras

tf.keras.backend.clear_session()

input_shape = (32,32,3) # 3072
nclasses = 10
model = tf.keras.Sequential([
                             tf.keras.layers.Flatten(input_shape=input_shape),
                             tf.keras.layers.Dense(units=1024), 
                             tf.keras.layers.Activation('tanh'),
                             tf.keras.layers.Dropout(0.5),
                             tf.keras.layers.Dense(units=512),
                             tf.keras.layers.Activation('tanh'),
                             tf.keras.layers.Dropout(0.5),
                             tf.keras.layers.Dense(units=nclasses), 
                             tf.keras.layers.Activation('softmax')
                             ])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 3072)              0         
_________________________________________________________________
dense (Dense)                (None, 1024)              3146752   
_________________________________________________________________
activation (Activation)      (None, 1024)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               524800    
_________________________________________________________________
activation_1 (Activation)    (None, 512)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                5130      
_________________________________________________________________
activation_2 (Activation)    (None, 10)                0         
=================================================================
Total params: 3,676,682
Trainable params: 3,676,682
Non-trainable params: 0
_________________________________________________________________

Training

optimizer = tf.keras.optimizers.Adam(lr=0.001)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
tf_history_dp = model.fit(x_train, y_train, batch_size=500, epochs=100, verbose=True, validation_data=(x_test, y_test))
Train on 50000 samples, validate on 10000 samples
Epoch 1/100
50000/50000 [==============================] - 3s 57us/sample - loss: 2.2985 - acc: 0.2712 - val_loss: 1.7830 - val_acc: 0.3898
Epoch 2/100
50000/50000 [==============================] - 3s 53us/sample - loss: 1.9610 - acc: 0.3329 - val_loss: 1.7066 - val_acc: 0.4134
.
.
Epoch 99/100
50000/50000 [==============================] - 3s 51us/sample - loss: 1.1897 - acc: 0.5877 - val_loss: 1.4037 - val_acc: 0.5220
Epoch 100/100
50000/50000 [==============================] - 3s 51us/sample - loss: 1.1839 - acc: 0.5884 - val_loss: 1.3991 - val_acc: 0.5227
import matplotlib.pyplot as plt

plt.figure(figsize=(20,7))

plt.subplot(1,2,1)
plt.plot(tf_history_dp.history['loss'], label='Training Loss')
plt.plot(tf_history_dp.history['val_loss'], label='Validation Loss')
plt.legend()

plt.subplot(1,2,2)
plt.plot(tf_history_dp.history['acc'], label='Training Accuracy')
plt.plot(tf_history_dp.history['val_acc'], label='Validation Accuracy')
plt.legend()
plt.show()

png

Model is clearly overfitting.

Image Augmentation

We have discussed that, more images/data improves the model performance and avoid overfitting. But it’s not always possible to get new data, so we can augment the old data to create new data.

Augmentation can be:

  • random crop
  • rotation
  • horizontal and vertical flips
  • x-y shift
  • colour jitter
  • etc.

Image Augmentation in Tensorflow

from tensorflow.keras.datasets import cifar10
import tensorflow as tf

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

x_train = (x_train-mean)/std
x_test = (x_test-mean)/std


num_classes = 10
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
import tensorflow as tf
from tensorflow import keras

tf.keras.backend.clear_session()

input_shape = (32,32,3) # 3072
nclasses = 10
model = tf.keras.Sequential([
                             tf.keras.layers.Flatten(input_shape=input_shape),
                             tf.keras.layers.Dense(units=1024), 
                             tf.keras.layers.Activation('tanh'),
                             tf.keras.layers.Dropout(0.2),
                             tf.keras.layers.Dense(units=512),
                             tf.keras.layers.Activation('tanh'),
                             tf.keras.layers.Dropout(0.2),
                             tf.keras.layers.Dense(units=nclasses), 
                             tf.keras.layers.Activation('softmax')
                             ])
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 3072)              0         
_________________________________________________________________
dense (Dense)                (None, 1024)              3146752   
_________________________________________________________________
activation (Activation)      (None, 1024)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               524800    
_________________________________________________________________
activation_1 (Activation)    (None, 512)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                5130      
_________________________________________________________________
activation_2 (Activation)    (None, 10)                0         
=================================================================
Total params: 3,676,682
Trainable params: 3,676,682
Non-trainable params: 0
_________________________________________________________________
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
        shear_range=0.1,
        zoom_range=0.1,
        horizontal_flip=True,
        rotation_range=20)

test_datagen = ImageDataGenerator()

train_generator = train_datagen.flow(
        x_train, y_train,
        batch_size=200)

validation_generator = test_datagen.flow(
        x_test, y_test,
        batch_size=200)


import matplotlib.pyplot as plt

i = 1
plt.figure(figsize=(20,2))
for x_batch, y_batch in train_datagen.flow(x_train, y_train, batch_size=1):
    plt.subplot(1,10,i)
    plt.imshow(x_batch[0])
    i += 1
    if i>10:break

png

You can see some of the images are zoomed, some are rotated…etc. SO these images are now different that the original image and for the model these are new images.

optimizer = tf.keras.optimizers.SGD(lr=0.001, momentum=0.9)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])

model.fit_generator(
        train_generator,
        steps_per_epoch=100,
        epochs=200,
        validation_data=validation_generator)
Epoch 1/200
100/100 [==============================] - 11s 113ms/step - loss: 2.1187 - acc: 0.2506 - val_loss: 1.8658 - val_acc: 0.3438
Epoch 2/200
100/100 [==============================] - 11s 105ms/step - loss: 1.9376 - acc: 0.3195 - val_loss: 1.8019 - val_acc: 0.3725
.
.
Epoch 199/200
100/100 [==============================] - 10s 102ms/step - loss: 1.4075 - acc: 0.5145 - val_loss: 1.3601 - val_acc: 0.5340
Epoch 200/200
100/100 [==============================] - 10s 104ms/step - loss: 1.4111 - acc: 0.5127 - val_loss: 1.3616 - val_acc: 0.5369





<tensorflow.python.keras.callbacks.History at 0x7f644a09bcc0>
import matplotlib.pyplot as plt

tf_history_aug = model.history

plt.figure(figsize=(20,7))

plt.subplot(1,2,1)
plt.plot(tf_history_aug.history['loss'], label='Training Loss')
plt.plot(tf_history_aug.history['val_loss'], label='Validation Loss')
plt.legend()

plt.subplot(1,2,2)
plt.plot(tf_history_aug.history['acc'], label='Training Accuracy')
plt.plot(tf_history_aug.history['val_acc'], label='Validation Accuracy')
plt.legend()
plt.show()

png

The model is not overfitting and the performance is still increasing, so training for more epoch can give a good performance, but it will take more time, so we will stop here. Try to improve the model.

  • Train the model longer.
  • Use different architecture with aug
  • use different activation
  • different optimizer

There are other model architectures which work good for images, we will discuss that in the intermediate track.

Saving a Trained Model

Saving a trained model is very important, hours of training should not be wasted and we need the trained model to be deployed in some other device. It’s very simple in tf.keras

model_path = 'cifar10_trained_model.h5'

model.save(model_path)
!ls
cifar10_trained_model.h5  sample_data

Loading a saved model

from tensorflow.keras.models import load_model

model = load_model(model_path)
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 3072)              0         
_________________________________________________________________
dense (Dense)                (None, 1024)              3146752   
_________________________________________________________________
activation (Activation)      (None, 1024)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1024)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               524800    
_________________________________________________________________
activation_1 (Activation)    (None, 512)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                5130      
_________________________________________________________________
activation_2 (Activation)    (None, 10)                0         
=================================================================
Total params: 3,676,682
Trainable params: 3,676,682
Non-trainable params: 0
_________________________________________________________________
Previous
Next
comments powered by Disqus