This is the first part of our Deep learning Tutorial with Tensorflow. The idea of Artificial Neural Networks (ANNs) is based on the belief that working of the human brain by making the right connections, can be mimicked using silicon and wires as living neurons and dendrites. It has been around since the 1940’s and has had a few ups and downs. There was mostly nothing interesting until about 2011, when Deep Neural Networks began to take hold and outperform the Support Vector Machines, using new techniques, huge data set availability, and much more powerful computers.
Biological motivation and connections
The model of the neural network is actually a very simple concept. The idea is to mimic a neuron, and, in a basic neuron, you have the dendrites, a nucleus, axon, and terminal axon. A neural network has a network of neurons which aren’t actually connected (axon terminal with dendrites) according to our present knowledge but between them, our messages are sent and that is synapse.
We basically have inputs so looking here we have some raw data, suppose 0,0,1, those inputs come in and in theory does something and sends the data down the axon which possibly fires and passes through synapse or may be to next neuron which in turn to the next and the next and so on to get the result.
The human brain is composed of 100 billion neurons. They are connected (not actually) to other thousand cells by Axons. Stimuli from external environment or inputs from sensory organs are accepted by dendrites (our data sets) . These inputs create electric impulses, which quickly travel through the neural network (processing). A neuron may then send the message to other neuron to handle the issue or may not send it forward.
A Basic Neural Network
Now computer scientists come in thinking we got this. So they propose a model of an artificial neuron that looks like:
Each input has an associated weight (w), which is assigned on the basis of its relative importance to other inputs. The node applies a function f (defined below) to the weighted sum of its inputs as shown in Figure below:
The above network takes numerical inputs X1 and X2 and has weights w1 and w2 associated with those inputs. Additionally, there is another input 1 with weight b (called the Bias) associated with it which we will worry about later.
The output axon from the neuron is computed as shown in the Figure . The function f is non-linear and is called the Activation Function. The purpose of the activation function is to introduce non-linearity into the output of a neuron. This is important because most real world data is non linear and we want neurons to learn these non linear representations.
Every activation function (or non-linearity) takes a single number and performs a certain fixed mathematical operation on it . There are several activation functions you may encounter in practice:
- Sigmoid: takes a real-valued input and squashes it to range between 0 and 1
σ(x) = 1 / (1 + exp(−x))
- tanh: takes a real-valued input and squashes it to the range [-1, 1]
tanh(x) = 2σ(2x) − 1
- ReLU: ReLU stands for Rectified Linear Unit. It takes a real-valued input and thresholds it at zero (replaces negative values with zero)
f(x) = max(0, x)
Now let see what a neural network to scale looks like.
If you are learning Neural networks you must have seen this diagram many times. The circles are neurons or nodes, with their functions on the data and the lines connecting them are the weights/information being passed along. Each column is a layer. The first layer of your data is the input layer. Then, unless your output is your input, you have at least one hidden layer. If you just have one hidden layer, then you have a regular artificial neural network. If you elect to have many hidden layers you have yourself a deep neural network. Wasn’t that easy? …Well at least in concept.
This specific model is what we call a feed-forward neural network. The information flow is unidirectional. They are specifically used in pattern generation/recognition/classification which we will cover in our upcoming tutorials.
A feed-forward neural network can consist of three types of nodes:
- Input Nodes –The Input nodes provide information from the outside world to the network and are together referred to as the “Input Layer”. No computation is performed in any of the Input nodes – they just pass on the information to the hidden nodes.
- Hidden Nodes – The Hidden nodes have no direct connection with the outside world (hence the name “hidden”). They perform computations and transfer information from the input nodes to the output nodes. A collection of hidden nodes forms a “Hidden Layer”. While a feed-forward network will only have a single input layer and a single output layer, it can have zero or multiple Hidden Layers.
- Output Nodes – The Output nodes are collectively referred to as the “Output Layer” and are responsible for computations and transferring information from the network to the outside world.
Machine learning algorithms have been good at classification tasks but they are not very good at modeling logic and figuring out how to do that up until recently. If you want to make an algorithm to answer a logical question you would have to model that logic somehow which means you have to know linguistics and stuff like that whereas in neural network you give a few examples, keep doing that like few million times, and the neural network can figure out on its own the model and the logic, which is SO FASCINATING.
Latest posts by Rishabh Jindal (see all)
- Data Hunting With Android - August 6, 2017
- What Is A Neural Network – Deep Learning with Tensorflow – Part 1 - July 30, 2017