This notebook is taken from the Machine Learning Mastery tutorials (machinelearningmastery.com). It covers step by step how to model your first neural network using Keras!
We are going to use the Pima Indians diabetes dataset. This is a standard machine learning dataset from the UCI Machine Learning repository. It describes patient medical record data for Pima Indians and whether they had an onset of diabetes within five years.
As such, it is a binary classification problem (onset of diabetes as 1 or not as 0). All of the input variables that describe each patient are numerical. This makes it easy to use directly with neural networks that expect numerical input and output values, and ideal for our first neural network in Keras.
Understanding the data
Input Variables (X):
-Number of times pregnant
-Plasma glucose concentration a 2 hours in an oral glucose tolerance test
-Diastolic blood pressure (mm Hg)
-Triceps skin fold thickness (mm)
-2-Hour serum insulin (mu U/ml)
-Body mass index (weight in kg/(height in m)^2)
-Diabetes pedigree function
-Age (years)
Output Variables (y):
Class variable (0 or 1)
Requirements:
Python 2 or 3 installed.
SciPy (including NumPy) installed.
Keras and a backend (Theano or TensorFlow) installed.
Step 1: Install libraries
pip install scipy tensorflow
To use Keras, you will need to have the TensorFlow package installed.
Once TensorFlow is installed, just import Keras. We will use the NumPy library to load our dataset and we will use two classes from the Keras library to define our model.
Step 2: Import libraries
# first neural network with keras
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
Step 3: Load the data
We can now load the file as a matrix of numbers using the NumPy function loadtxt().
# load the dataset
dataset = loadtxt('../assets/pima-indians-diabetes.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]
Step 4: Define KERAS model
Models in Keras are defined as a sequence of layers.
We create a Sequential model and add layers one at a time until we are happy with our network architecture.
First, ensure the input layer has the right number of input features. This can be specified when creating the first layer with the input_shape argument and setting it to (8,) for presenting the 8 input variables as a vector.
How do we know the number of layers and their types? Often, the best network structure is found through a process of trial and error experimentation.
In this example, we will use a fully-connected network structure with three layers.
Fully connected layers are defined using the Dense class. We can specify the number of neurons or nodes in the layer as the first argument, and specify the activation function using the activation argument.
We will use the rectified linear unit activation function referred to as ReLU on the first two layers and the Sigmoid function in the output layer. We use a sigmoid on the output layer to ensure our network output is between 0 and 1 and easy to map to either a probability of class 1 or snap to a hard classification of either class with a default threshold of 0.5.
To sum up:
-The model expects rows of data with 8 variables (the input_shape=(8,) argument)
-The first hidden layer has 12 nodes and uses the relu activation function.
-The second hidden layer has 8 nodes and uses the relu activation function.
-The output layer has one node and uses the sigmoid activation function.
# define the keras model
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
Step 5: Compile KERAS model
Now that the model is defined, we can compile it. The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware, such as CPU or GPU or even distributed.
When compiling, we must specify some additional properties required when training the network. Remember training a network means finding the best set of weights to map inputs to outputs in our dataset.
We must specify the loss function to use to evaluate a set of weights, the optimizer is used to search through different weights for the network and any optional metrics we would like to collect. In this case, we will use cross entropy as the loss argument. This loss is for a binary classification problem and is defined in Keras as “binary_crossentropy“.
We will define the optimizer as the efficient stochastic gradient descent algorithm “adam“. This is a popular version of gradient descent because it automatically tunes itself and gives good results in a wide range of problems. we will collect and report the classification accuracy, defined via the metrics argument.
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
Step 6: Fit KERAS model
We have defined our model and compiled it ready for efficient computation. Now let's execute the model on some data.
Training occurs over epochs and each epoch is split into batches.
-Epoch: One pass through all of the rows in the training dataset.
-Batch: One or more samples considered by the model within an epoch before weights are updated
The training process will run for a fixed number of iterations through the dataset called epochs, that we must specify using the epochs argument. We must also set the number of dataset rows that are considered before the model weights are updated within each epoch, called the batch size and set using the batch_size argument.
For this problem, we will run for a small number of epochs (150) and use a relatively small batch size of 10.
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10)
This is where the work happens on your CPU or GPU.
Step 7: Evaluate KERAS model
We have trained our neural network on the entire dataset and we can evaluate the performance of the network on the same dataset. You can ideally separate your data into train and test sets. The evaluate() function will return a list with two values. The first will be the loss of the model on the dataset and the second will be the accuracy of the model on the dataset. Here, we are noly interested in the accuracy so we'll ignore the loss value.
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
Put all the previous code together in a .py file, for example named 'my_first_neural_network.py'. If you try running this example in an IPython or Jupyter notebook you may get an error.
You can then run the Python file as a script from your command line as follows:
python my_first_neural_network.py
Running this example, you should see a message for each of the 150 epochs printing the loss and accuracy, followed by the final evaluation of the trained model on the training dataset.
We would love the loss to go to zero and accuracy to go to 1.0. This is not possible for any but the most trivial machine learning problems. Instead, we will always have some error in our model. The goal is to choose a model configuration and training configuration that achieve the lowest loss and highest accuracy possible for a given dataset.
Neural networks are a stochastic algorithm, meaning that the same algorithm on the same data can train a different model with different skill each time the code is run. This is a feature, not a bug.
The variance in the performance of the model means that to get a reasonable approximation of how well your model is performing, you may need to fit it many times and calculate the average of the accuracy scores.
Step 8: Make predictions
So after training my model a few times and getting an average of all the accuracies obtained, how do I make predictions?
Making predictions is as easy as calling the predict() function on the model. We are using a sigmoid activation function on the output layer, so the predictions will be a probability in the range between 0 and 1. We can easily convert them into a crisp binary prediction for this classification task by rounding them.
# make probability predictions with the model. In this case we are using again the same dataset as if it was new data.
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
We can convert the probability into 0 or 1 to predict crisp classes directly:
# make class predictions with the model
predictions = (model.predict(X) > 0.5).astype(int)
The complete example below makes predictions for each example in the dataset, then prints the input data, predicted class and expected class for the first 5 examples in the dataset.
# first neural network with keras make predictions
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# load the dataset
dataset = loadtxt('../assets/pima-indians-diabetes.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]
# define the keras model
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10, verbose=0)
# make class predictions with the model
predictions = (model.predict(X) > 0.5).astype(int)
# summarize the first 5 cases
for i in range(5):
print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))
The reason why you could get errors in a jupyter notebook is because of the output progress bars during training. You can easily turn these off by setting verbose=0 in the call to the fit() and evaluate() functions, as we just did in the example.
We can see that most rows are correctly predicted. In fact, we would expect about 76.9% of the rows to be correctly predicted based on our estimated performance of the model in the previous section.
Step 9: Save your model
Saving models requires that you have the h5py library installed. It is usually installed as a dependency with TensorFlow. You can also install it easily as follows:
sudo pip install h5py
Keras separates the concerns of saving your model architecture and saving your model weights.
The model structure can be described and saved using two different formats: JSON and YAML. Both of them save the model architecture and weights separately. The model weights are saved into a HDF5 format file in all cases.
Keras also supports a simpler interface to save both the model weights and model architecture together into a single H5 file.
Saving the model in this way includes everything we need to know about the model, including:
Model weights. Model architecture. Model compilation details (loss and metrics). Model optimizer state.
This means that we can load and use the model directly, without having to re-compile it.
Note: this is the preferred way for saving and loading your Keras model.
You can save your model by calling the save() function on the model and specifying the filename.
The example below demonstrates this by first fitting a model, evaluating it and saving it to the file model.h5.
# MLP for Pima Indians Dataset saved to single file
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# load pima indians dataset
dataset = loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# define model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# Fit the model
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
# evaluate the model
scores = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
# save model and architecture to single file
model.save("model.h5")
print("Saved model to disk")
An equivalent code to save the model is the following:
# equivalent to: model.save("model.h5")
from tensorflow.keras.models import save_model
save_model(model, "model.h5")
If you want to know how to save your model using JSON or YAML go to https://machinelearningmastery.com/save-load-keras-deep-learning-models/
Step 10: Load your model
Your saved model can then be loaded later by calling the load_model() function and passing the filename. The function returns the model with the same architecture and weights.
In the following code, we load the model, summarize the architecture and evaluate it on the same dataset to confirm the weights and architecture are the same.
# load and evaluate a saved model
from numpy import loadtxt
from tensorflow.keras.models import load_model
# load model
model = load_model('model.h5')
# summarize model.
model.summary()
# load dataset
dataset = loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# evaluate the model
score = model.evaluate(X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], score[1]*100))