# Neural Networks

### Neural Network Learning

A Perceptron is a type of Artificial neural network which is commonly used in Artificial Intelligence for a wide range of classification and prediction problems. The basic approach in learning is to start with an untrained network, present a training pattern to the input layer, pass the signals through the net and determine the output at the output layer. Here these outputs are compared to the target values; any difference corresponds to an error. This error or criterion function is some scalar function of the weights and is minimised when the network outputs match the desired output. Thus the weights are adjusted to reduce is measure of error .

##### Steps in training and running a Perceptron:
1. Get samples of training and testing sets. These should include
1. What the inputs $x_i$ (observations) are
1. Inputs and outputs generally should be normalized so that the largest number is 1 and the smallest is either -1 or 0. This can be done with the basic formula
File:Basic Normalization Formula.gif
Here normalizeConst would be 2 and offsetConst would be 1 is we normalized from -1 to 1 so we would have:
$x_i ^\prime = \frac{{x_i - \min \left( {\mathbf{x}} \right)}} {{\max \left( {\mathbf{x}} \right) - \min \left( {\mathbf{x}} \right)}} \cdot 2 - 1$
2. What outputs $t_k$ (decisions) you expect it to make
2. Set up the network
1. Created input and output nodes
2. Create weighted edges $w_{ki}$ between each node. We usually set initial weights randomly from 0 to 1 or -1 to 1.
3. Run the training set over and over again and adjust the weights a little bit each time.
4. When the error converges, run the testing set to make sure that the neural network generalizes a good answer.

These steps can also be applied to the multi layer Perceptron.

### Multi Layer Perceptrons

With a single layer perceptron we can solve a problem so long as it is linearly separable.

One way to accomplish the classification of non-linearly separable regions of space is in a sense to sub-classify the classification. Thus we add an extra layer of neurons $w_{kj}^{\left( 2 \right)}$ on top of the ones we already have. When the input runs through the first layer $w_{ji}^{\left( 1 \right)}$, the output from that layer can be numerically split or merged allowing regions that do not touch each other in space to still yield the same output.

To add layers we need to do one more thing other than just connect up some new weights. We need to introduce what is known as a non-linearity $g\left( a \right)$. In general, the non-linearity works to make the outputs from each layer more crisp. This is accomplished by using a sigmoidal activation function. This tends to get rid of mathematical values that are in the middle and force values which are low to be even lower and values which are high to be even higher.

It should be noted that there are two basic commonly used sigmoidal activation functions.

• The Logistic Sigmoid - Also called the logsig, this is the integral of the statistical Gaussian distribution.
$g\left( a \right) \equiv \frac{1} {{1 + e^{ - a} }}$
• The Tangental Sigmoid - Also called the tansig, this is derived from the hyperbolic tangent. It has the advantage over the logsig of being able to deal directly with negative numbers.
$g\left( a \right) \equiv \tanh \left( a \right) \equiv \frac{{e^a - e^{ - a} }} {{e^a + e^{ - a} }}$

### Training and Back Propagation

The standard way to train a multi layer perceptron is using a method called back propagation. This is used to solve a basic problem called assignment of credit, which comes up when we try to figure out how to adjust the weights of edges coming from the input layer. In a single layer perceptron, we can easily know which weights were produce the error because we could directly observe the weights and output from those weighted edges. However, we have a new layer that will pass through another layer of weights. As such, the contribution of the new weights to the error is obscured by the fact that the data will pass through a second set of weights.

## R-CARET Implementation

CARET allows the model building through Neural Networking by using the following methods:

• nnet
• pcaNNet