# Good Articles to learn how to implement a neural network 1

# How to implement a neural network Intermezzo 1

http://peterroelants.github.io/posts/neural_network_implementation_intermezzo01/

This page is part of a 5 (+2) parts tutorial on how to implement a simple neural network model. You can find the links to the rest of the tutorial here:

## Logistic classification function

This intermezzo will cover:

- The logistic function
- Cross-entropy cost function

If we want to do classification with neural networks we want to output a probability distribution over the classes from the output targets t

. For the classification of 2 classes t=1 or t=0

we can use the logistic function used in logistic regression . For multiclass classification there exists an extension of this logistic function called the softmax function which is used in multinomial logistic regression . The following section will explain the logistic function and how to optimize it, the next intermezzo will explain the softmax function and how to derive it.

1 2 3 4 5 |
# Python imports import numpy as np # Matrix and vector computation package import matplotlib.pyplot as plt # Plotting library # Allow matplotlib to plot inside this notebook %matplotlib inline |

## Logistic function

The goal is to predict the target class t

from an input z. The probability P(t=1|z) that input z is classified as class t=1 is represented by the output y of the logistic function computed as y=σ(z). σ is the logistic function and is defined as:

This logistic function, implemented below as logistic(z) , maps the input z

to an output between 0 and 1

as is illustrated in the figure below.

We can write the probabilities that the class is t=1

or t=0 given input z

as:

Note that input z

to the logistic function corresponds to the log odds ratio of P(t=1|z) over P(t=0|z)

.

This means that the logg odds ratio log(P(t=1|z)/P(t=0|z))

changes linearly with z. And if z=x∗w as in neural networks, this means that the logg odds ratio changes linearly with the parameters w and input samples x

.

1 2 3 |
# Define the logistic function def logistic(z): return 1 / (1 + np.exp(-z)) |

1 2 3 4 5 6 7 8 |
# Plot the logistic function z = np.linspace(-6,6,100) plt.plot(z, logistic(z), 'b-') plt.xlabel('$z$', fontsize=15) plt.ylabel('$\sigma(z)$', fontsize=15) plt.title('logistic function') plt.grid() plt.show() |

### Derivative of the logistic function

Since neural networks typically use gradient based opimization techniques such as gradient descent it is important to define the derivative of the output y

of the logistic function with respect to its input z. ∂y/∂z

can be calculated as:

And since 1−σ(z))=1−1/(1+e−z)=e−z/(1+e−z)

this can be rewritten as:

This derivative is implemented as logistic_derivative(z) and is plotted below.

1 2 3 |
# Define the logistic function def logistic_derivative(z): return logistic(z) * (1 - logistic(z)) |

1 2 3 4 5 6 7 8 |
# Plot the derivative of the logistic function z = np.linspace(-6,6,100) plt.plot(z, logistic_derivative(z), 'r-') plt.xlabel('$z$', fontsize=15) plt.ylabel('$\\frac{\\partial \\sigma(z)}{\\partial z}$', fontsize=15) plt.title('derivative of the logistic function') plt.grid() plt.show() |

### Cross-entropy cost function for the logistic function

The output of the model y=σ(z)

can be interpreted as a probability y that input z belongs to one class (t=1), or probability 1−y that z belongs to the other class (t=0) in a two class classification problem. We note this down as: P(t=1|z)=σ(z)=y

.

The neural network model will be optimized by maximizing the likelihood that a given set of parameters θ

of the model can result in a prediction of the correct class of each input sample. The parameters θ transform each input sample i into an input to the logistic function zi

. The likelihood maximization can be written as:

The likelihood (θ|t,z)

can be rewritten as the joint probability of generating t and z given the parameters θ: P(t,z|θ). Since P(A,B)=P(A|B)∗P(B)

this can be written as:

Since we are not interested in the probability of z

we can reduce this to: (θ|t,z)=P(t|z,θ)=∏ni=1P(ti|zi,θ). Since ti is a Bernoulli variable , and the probability P(t|z)=y is fixed for a given θ

we can rewrite this as:

Since the logarithmic function is a monotone increasing function we can optimize the log-likelihood function argmaxθlog(θ|t,z)

. This maximum will be the same as the maximum from the regular likelihood function. The log-likelihood function can be written as:

Minimizing the negative of this function (minimizing the negative log likelihood) corresponds to maximizing the likelihood. This error function ξ(t,y)

is typically known as the cross-entropy error function (also known as log-loss):

This function looks complicated but besides the previous derivation there are a couple of intuitions why this function is used as a cost function for logistic regression. First of all it can be rewritten as:

Which in the case of ti=1

is 0 if yi=1 (−log(1)=0) and goes to infinity as yi→0 (limy→0−log(y)=+∞). The reverse effect is happening if ti=0.

So what we end up with is a cost function that is 0 if the probability to predict the correct class is 1 and goes to infinity as the probability to predict the correct class goes to 0

.

Notice that the cost function ξ(t,y)

is equal to the negative log probability that z is classified as its correct class:

−log(P(t=1|z))=−log(y),

−log(P(t=0|z))=−log(1−y)

.

By minimizing the negative log probability, we will maximize the log probability. And since t

can only be 0 or 1, we can write ξ(t,y) as:

Which will give ξ(t,y)=−∑ni=1[tilog(yi)+(1−ti)log(1−yi)]

if we sum over all n

samples.

Another reason to use the cross-entropy function is that in simple logistic regression this results in a convex cost function, of which the global minimum will be easy to find. Note that this is not necessarily the case anymore in multilayer neural networks.

#### Derivative of the cross-entropy cost function for the logistic function

The derivative ∂ξ/∂y

of the cost function with respect to its input can be calculated as:

This derivative will give a nice formula if it is used to calculate the derivative of the cost function with respect to the inputs of the classifier ∂ξ/∂z

since the derivative of the logistic function is ∂y/∂z=y(1−y)

:

This post at peterroelants.github.io is generated from an IPython notebook file. Link to the full IPython notebook file

# How to implement a neural network Part 2

http://peterroelants.github.io/posts/neural_network_implementation_part02/

This page is part of a 5 (+2) parts tutorial on how to implement a simple neural network model. You can find the links to the rest of the tutorial here:

## Logistic regression (classification)

This part will cover:

- The logistic classification model

While the previous tutorial described a very simple one-input-one-output linear regression model, this tutorial will describe a 2-class classification neural network with two input dimensions. This model is known in statistics as the logistic regression model. This network can be represented graphically as:

The notebook starts out with importing the libraries we need:

1 2 3 4 5 6 7 8 9 10 11 |
# Python imports import numpy as np # Matrix and vector computation package np.seterr(all='ignore') # ignore numpy warning like multiplication of inf import matplotlib.pyplot as plt # Plotting library from matplotlib.colors import colorConverter, ListedColormap # some plotting functions from matplotlib import cm # Colormaps # Allow matplotlib to plot inside this notebook %matplotlib inline # Set the seed of the numpy random number generator so that the tutorial is reproducable np.random.seed(seed=1) |

## Define the class distributions

In this example the target classes t

will be generated from 2 class distributions: blue (t=1) and red (t=0). Samples from both classes are sampled from their respective distributions. These samples are plotted in the figure below. Note that X is a N×2 matrix of individual input samples xi, and that t is a corresponding N×1 vector of target values ti

.

1 2 3 4 5 6 7 8 9 10 11 12 |
# Define and generate the samples nb_of_samples_per_class = 20 # The number of sample in each class red_mean = [-1,0] # The mean of the red class blue_mean = [1,0] # The mean of the blue class std_dev = 1.2 # standard deviation of both classes # Generate samples from both classes x_red = np.random.randn(nb_of_samples_per_class, 2) * std_dev + red_mean x_blue = np.random.randn(nb_of_samples_per_class, 2) * std_dev + blue_mean # Merge samples in set of input variables x, and corresponding set of output variables t X = np.vstack((x_red, x_blue)) t = np.vstack((np.zeros((nb_of_samples_per_class,1)), np.ones((nb_of_samples_per_class,1)))) |

1 2 3 4 5 6 7 8 9 10 |
# Plot both classes on the x1, x2 plane plt.plot(x_red[:,0], x_red[:,1], 'ro', label='class red') plt.plot(x_blue[:,0], x_blue[:,1], 'bo', label='class blue') plt.grid() plt.legend(loc=2) plt.xlabel('$x_1$', fontsize=15) plt.ylabel('$x_2$', fontsize=15) plt.axis([-4, 4, -4, 4]) plt.title('red vs. blue classes in the input space') plt.show() |