Google ML Academy 2019
Instructor: Shangeth Rajaa
Solutions Here:
Its a very simple assignment, you can finish it in less than 10 minutes. If you are stuck somewhere refer this for solutions.
Task-1 : Linear Regression on Non-Linear Data
- Get X and y from dataset() function
- Train a Linear Regression model for this dataset.
- Visualize the model prediction
Dataset
Call dataset()
function to get X, y
import numpy as np
import matplotlib.pyplot as plt
def dataset(show=True):
X = np.arange(-25, 25, 0.1)
# Try changing y to a different function
y = X**3 + 20 + np.random.randn(500)*1000
if show:
plt.scatter(X, y)
plt.show()
return X, y
X, y = dataset()
Scaling Dataset
The maximum value of y in the dataset goes upto 15000 and the minimum values is less than -15000. The range of y is very large which makes the convergence/loss reduction slower. So will we scale the data, scaling the data will help the model converge faster. If all the features and target are in same range, there will be symmetry in the curve of Loss vs weights/bias, which makes the convergence faster.
We will do a very simple type of scaling, we will divide all the values of the data with the maximum values for X and y respectively.
X, y = dataset()
print(max(X), max(y), min(X), min(y))
X = X/max(X)
y = y/max(y)
print(max(X), max(y), min(X), min(y))
24.90000000000071 16694.307606867886 -25.0 -16126.103960535462
1.0 1.0 -1.0040160642569995 -0.9659642280642613
This is not a great scaling method, but good to start. We will see many more scaling/normalizing methods later.
Try training the model with and without scaling and see the difference yourself.
Linear Regression in TensorFlow
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
X, y = dataset(show=False)
X_scaled = X/max(X)
y_scaled = y/max(y)
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[1])])
# you can also define optimizers in this way, so you can change parameters like lr.
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
model.compile(optimizer=optimizer, loss='mean_squared_error')
tf_history = model.fit(X_scaled, y_scaled, epochs=500, verbose=True)
plt.plot(tf_history.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('MSE Loss')
plt.show()
mse = tf_history.history['loss'][-1]
y_hat = model.predict(X_scaled)
plt.figure(figsize=(12,7))
plt.title('TensorFlow Model')
plt.scatter(X_scaled, y_scaled, label='Data $(X, y)$')
plt.plot(X_scaled, y_hat, color='red', label='Predicted Line $y = f(X)$',linewidth=4.0)
plt.xlabel('$X$', fontsize=20)
plt.ylabel('$y$', fontsize=20)
plt.text(0,0.70,'MSE = {:.3f}'.format(mse), fontsize=20)
plt.grid(True)
plt.legend(fontsize=20)
plt.show()
WARNING: Logging before flag parsing goes to stderr.
W0829 03:45:55.755687 139671975163776 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Epoch 1/500
500/500 [==============================] - 0s 951us/sample - loss: 0.2951
Epoch 2/500
500/500 [==============================] - 0s 41us/sample - loss: 0.2856
.
.
Epoch 499/500
500/500 [==============================] - 0s 41us/sample - loss: 0.0264
Epoch 500/500
500/500 [==============================] - 0s 39us/sample - loss: 0.0263
Looks the model Prediction for this dataset is not very great, but that is expected as the model is a straight line, it cannot predict non linear regression data. Is there a way to train a regression model for this task?
Polynomial Regression
So when the dataset is not linear, linear regression cannot learn the dataset and make good predictions.
So we need a polynomial model which consideres the polynomial terms as well. So we need terms like $x^2$, $x^3$, …, $x^n$ for the model to learn a polynomial of $n^{th}$ degree.
$\hat{y} = w_0 + w_1x + w_2x^2 + … + w_nx^n$
One down side of this model is that, We will have to decide the value of n. But this is better than a linear regression model. We can get an idea of the value of n by visualizing a dataset, but for multi variable dataset, we will have to try different values of n and check which is better.
Polynomial Features
you can calculate the polynomial features for each feature by programming it or you can try sklearn.preprocessing.PolynomialFeatures
which allows us to make polynomial terms of our data.
We will try degree 2, 3 and 4
X, y = dataset(show=False)
X_scaled = X/max(X)
y_scaled = y/max(y)
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2)
X_2 = poly.fit_transform(X_scaled.reshape(-1,1))
print(X_2.shape)
print(X_2[0])
(500, 3)
[ 1. -1.00401606 1.00804826]
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=3)
X_3 = poly.fit_transform(X_scaled.reshape(-1,1))
print(X_3.shape)
print(X_3[0])
(500, 4)
[ 1. -1.00401606 1.00804826 -1.01209664]
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=4)
X_4 = poly.fit_transform(X_scaled.reshape(-1,1))
print(X_4.shape)
print(X_4[0])
(500, 5)
[ 1. -1.00401606 1.00804826 -1.01209664 1.01616129]
The PolynomialFeatures returns $[1, x, x^2, x^3,…]$.
Task - 2
- Train a model with polynomial terms in the dataset.
- Visualize the prediction of the model
The code remains the same except, the no of input features will be 3, 4, 5 respectively.
Tensorflow Model with 2nd Degree
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[3])])
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
model.compile(optimizer=optimizer, loss='mean_squared_error')
tf_history = model.fit(X_2, y_scaled, epochs=500, verbose=True)
plt.plot(tf_history.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('MSE Loss')
plt.show()
mse = tf_history.history['loss'][-1]
y_hat = model.predict(X_2)
plt.figure(figsize=(12,7))
plt.title('TensorFlow Model')
plt.scatter(X_2[:, 1], y_scaled, label='Data $(X, y)$')
plt.plot(X_2[:, 1], y_hat, color='red', label='Predicted Line $y = f(X)$',linewidth=4.0)
plt.xlabel('$X$', fontsize=20)
plt.ylabel('$y$', fontsize=20)
plt.text(0,0.70,'MSE = {:.3f}'.format(mse), fontsize=20)
plt.grid(True)
plt.legend(fontsize=20)
plt.show()
Epoch 1/500
500/500 [==============================] - 0s 332us/sample - loss: 0.5278
Epoch 2/500
500/500 [==============================] - 0s 38us/sample - loss: 0.4773
.
.
Epoch 499/500
500/500 [==============================] - 0s 44us/sample - loss: 0.0242
Epoch 500/500
500/500 [==============================] - 0s 43us/sample - loss: 0.0242
Why is the polynomial regression with 2-d features look like a straight line?
Well because the model thinks that a straight line(look like, we can’t be sure if its a straight like, it can a parabola as well) better fits the dataset than a parabola. If you train the model for less epochs you can notice the model tries to fit the data with a parabola(2-d) but it improves as it moves to a line.
Train the same model for may be 50 epochs.
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[3])])
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
model.compile(optimizer=optimizer, loss='mean_squared_error')
tf_history = model.fit(X_2, y_scaled, epochs=50, verbose=True)
plt.plot(tf_history.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('MSE Loss')
plt.show()
mse = tf_history.history['loss'][-1]
y_hat = model.predict(X_2)
plt.figure(figsize=(12,7))
plt.title('TensorFlow Model')
plt.scatter(X_2[:, 1], y_scaled, label='Data $(X, y)$')
plt.plot(X_2[:, 1], y_hat, color='red', label='Predicted Line $y = f(X)$',linewidth=4.0)
plt.xlabel('$X$', fontsize=20)
plt.ylabel('$y$', fontsize=20)
plt.text(0,0.70,'MSE = {:.3f}'.format(mse), fontsize=20)
plt.grid(True)
plt.legend(fontsize=20)
plt.show()
Epoch 1/50
500/500 [==============================] - 0s 370us/sample - loss: 0.8566
Epoch 2/50
500/500 [==============================] - 0s 38us/sample - loss: 0.7970
.
.
Epoch 49/50
500/500 [==============================] - 0s 38us/sample - loss: 0.1027
Epoch 50/50
500/500 [==============================] - 0s 35us/sample - loss: 0.1000
You can clearly see that the model tries to fit the data with a parabola which doen’t seem to fit well, so it changes the parameters to fit a line.
Tensorflow Model with 3rd Degree
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[4])])
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
model.compile(optimizer=optimizer, loss='mean_squared_error')
tf_history = model.fit(X_3, y_scaled, epochs=500, verbose=True)
plt.plot(tf_history.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('MSE Loss')
plt.show()
mse = tf_history.history['loss'][-1]
y_hat = model.predict(X_3)
plt.figure(figsize=(12,7))
plt.title('TensorFlow Model')
plt.scatter(X_3[:, 1], y_scaled, label='Data $(X, y)$')
plt.plot(X_3[:, 1], y_hat, color='red', label='Predicted Line $y = f(X)$',linewidth=4.0)
plt.xlabel('$X$', fontsize=20)
plt.ylabel('$y$', fontsize=20)
plt.text(0,0.70,'MSE = {:.3f}'.format(mse), fontsize=20)
plt.grid(True)
plt.legend(fontsize=20)
plt.show()
Epoch 1/500
500/500 [==============================] - 0s 456us/sample - loss: 0.4445
Epoch 2/500
500/500 [==============================] - 0s 40us/sample - loss: 0.3993
.
.
Epoch 499/500
500/500 [==============================] - 0s 38us/sample - loss: 0.0040
Epoch 500/500
500/500 [==============================] - 0s 37us/sample - loss: 0.0039
3-D features perfectly fit the data with a 3rd degree polynomial as expected.
Tensorflow Model with 4th Degree
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[5])])
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
model.compile(optimizer=optimizer, loss='mean_squared_error')
tf_history = model.fit(X_4, y_scaled, epochs=500, verbose=True)
plt.plot(tf_history.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('MSE Loss')
plt.show()
mse = tf_history.history['loss'][-1]
y_hat = model.predict(X_4)
plt.figure(figsize=(12,7))
plt.title('TensorFlow Model')
plt.scatter(X_4[:, 1], y_scaled, label='Data $(X, y)$')
plt.plot(X_4[:, 1], y_hat, color='red', label='Predicted Line $y = f(X)$',linewidth=4.0)
plt.xlabel('$X$', fontsize=20)
plt.ylabel('$y$', fontsize=20)
plt.text(0,0.70,'MSE = {:.3f}'.format(mse), fontsize=20)
plt.grid(True)
plt.legend(fontsize=20)
plt.show()
Epoch 1/500
500/500 [==============================] - 0s 240us/sample - loss: 0.5839
Epoch 2/500
500/500 [==============================] - 0s 37us/sample - loss: 0.5453
.
.
Epoch 499/500
500/500 [==============================] - 0s 37us/sample - loss: 0.0040
Epoch 500/500
500/500 [==============================] - 0s 39us/sample - loss: 0.0040
4th degree poly-regression also did a good job in fitting the data as it also have the 3rd degree terms.
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
model = tf.keras.Sequential([keras.layers.Dense(units=1, input_shape=[5])])
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
model.compile(optimizer=optimizer, loss='mean_squared_error')
tf_history = model.fit(X_4, y_scaled, epochs=50, verbose=True)
plt.plot(tf_history.history['loss'])
plt.xlabel('Epochs')
plt.ylabel('MSE Loss')
plt.show()
mse = tf_history.history['loss'][-1]
y_hat = model.predict(X_4)
plt.figure(figsize=(12,7))
plt.title('TensorFlow Model')
plt.scatter(X_4[:, 1], y_scaled, label='Data $(X, y)$')
plt.plot(X_4[:, 1], y_hat, color='red', label='Predicted Line $y = f(X)$',linewidth=4.0)
plt.xlabel('$X$', fontsize=20)
plt.ylabel('$y$', fontsize=20)
plt.text(0,0.70,'MSE = {:.3f}'.format(mse), fontsize=20)
plt.grid(True)
plt.legend(fontsize=20)
plt.show()
Epoch 1/50
500/500 [==============================] - 0s 181us/sample - loss: 0.6724
Epoch 2/50
500/500 [==============================] - 0s 35us/sample - loss: 0.6104
.
.
Epoch 48/50
500/500 [==============================] - 0s 34us/sample - loss: 0.0661
Epoch 49/50
500/500 [==============================] - 0s 35us/sample - loss: 0.0655
Epoch 50/50
500/500 [==============================] - 0s 38us/sample - loss: 0.0648
If you run the 4th degree poly-regression for fewer epochs, you can notice, the model tries to fit a 4th(or higher than 3rd) degree polynomial but as the loss is high, the model changes it parameters to set the 4th degree terms to almost 0 and thus giving a 3rd degree polynomial as you train for more epochs.
This is polynomial regression. Yes, its easy. But one issue, as this was a toy dataset we know its a 3rd degree data, so we tried 2,3,4. But when the data is multi dimensional we cannot visualize the dataset, so its difficult to decide the degree. This is why you will see Neural Networks are awesome. They are End-End, they do not need several feature extraction from our side, they can extract necessary features of their own.
Make a Higher degree (4th/5th degree) data and try polynomial regression on it. Also try different functions like exponents, trignometric..etc.