Neural Networks

The brain behind AI

Matteo Di Paolantonio

Definition

“machine learning algorithm inspired by the structure and function of the human brain.”

What they are: Neural networks are computational models that mimic the way the brain processes information. They are used in various applications, including image recognition, natural language processing, and predictive modeling.
Types: There are different types of neural networks, including feedforward neural networks, convolutional neural networks, and recurrent neural networks. Each type has its own specific structure and is used for different tasks.
Feedforward —> A feedforward neural network is a general architecture where information flows in one direction. A Multilayer Perceptron (MLP) is a specific type of feedforward network with a particular structure involving multiple layers.

Structure

Neurons

Layers

Weights

Neurons and layers

6

The white pattern activates some neurons in your brain.

Defining the pattern

28 x 28 = 784 pixels

Defining the pattern

784 greyscale values (0.0 - 1.0)

Defining the pattern

0 1 2 3 4 5

7 8 9

Feedforward

Multilayer Perceptron

Pattern as input

784 neurons

Pattern values as first layer

Activation

0.4

Activation value

First and last layer

0.0

Inactive

0.4

Active

0.0

Inactive

1.0

Really Active

Sub-patterns

Loops and lines

Sub-patterns

Fragments of loops and lines

Middle layers

Recap

There are many type of neural networks and we are focusing on the simplest one: the Multilayer Perceptron (MLP).
A neuron is a basic unit of a neural network. For the sake of simplicity let's say that it holds a value between 0.0 and 1.0. The greater the value the more active is the neuron.
Neurons are organized in layers. The first is the input layer, the last is the output layer, and the middle layers are the hidden layers.
Neurons in a given layer are connected to the neurons in the previous layer. Activation values are propagated forward through the network.

Doubts or thoughts?

Weights

Weights propagate activation

Weights

Activation
0.0 ↔ 1.0

Weight
-x ↔ y

Weights

w₁ = 7.0 w₂ = -4.6 w₃ = 5.8 w₄ = 7.3 w₅ = -0.5 w₆ = -3.2 . . . w₇₈₄ = 2.2

Activation

w₁a₁ + w₂a₂ + ... + w_na_n

Activation

w₁a₁ + w₂a₂ + ... + w_na_n

!
Weight
-x ↔ y

Sigmoid function

σ(x) = 1 / (1 + e^-x)

Activation

σ( w₁a₁ + w₂a₂ + ... + w_na_n )

Activation

σ( w₁a₁ + w₂a₂ + ... + w_na_n + b )

Bias

Some numbers

4 layers
784 + 16 + 16 + 10 = 826 neurons
(784 x 16) + (16 x 16) + (16 x 10) = 12.960 weights

Activation

a₀⁽¹⁾ = σ( w_0.0a₀⁽⁰⁾ + w_0.1a₁⁽⁰⁾ + ... + w_0.na_n⁽⁰⁾ + b₀⁽¹⁾ )

Layer Activation

                σ
                (
            
                  [
                  
                    w0,0w0,1...w0,n
w1,0w1,1...w1,n
⋮⋱⋱⋮
wk,0wk,1...wk,n

                  ]
                
                  [
                  
                    a(0)0
a(0)1
⋮
a(0)n

                  ]
                
                 + 
            
                  [
                  
                    b0(1)
b1(1)
⋮
bn(1)

                  ]
                
                )

Activation

a⁽¹⁾ = σ( W^(1,0)a⁽⁰⁾ + b⁽¹⁾ )

a⁽²⁾ = σ( W^(2,1)a⁽¹⁾ + b⁽²⁾ )

a⁽³⁾ = σ( W^(3,2)a⁽²⁾ + b⁽³⁾ )

a⁽³⁾ = σ( W^(3,2)σ( W^(2,1)σ( W^(1,0)a⁽⁰⁾ + b⁽¹⁾ ) + b⁽²⁾ ) + b⁽³⁾ )

Just a function...

ƒ(a₀, ..., a₇₈₃) =

y₀

⋮

y₉

...with many parameters

(784 x 16) + (16 x 16) + (16 x 10) = 12.960 weights
16 + 16 + 10 = 42 biases

13.002

Recap

Neurons are wired together by weights, each neuron is connected to all the neurons in the previous layer and all the neurons in the next layer.
Neurons activation values are defined by the previous layer's neurons and the weights that connect them, plus a bias.
A neural network is a just a function, an overly complex one with many parameters. Weights (and biases) are the parameters of the function, its dials and knobs.
The model implemented by a neural network is made of the huge amount of parameters and the matrix operations that are performed to calculate the function output.

Doubts or thoughts?

Functions

MNIST dataset

Test and train

Test and train

Cost

✓

(0.22 - 0.00)² + (0.86 - 0.00)² + (0.38 - 0.00)² + (0.92 - 0.00)² + (0.75 - 0.00)² + (0.12 - 0.00)² + (0.66 - 1.00)² + (0.88 - 0.00)² + (0.43 - 0.00)² + (0.15 - 0.00)² Single sample cost

Cost

(0.22 - 0.00)² = 0.0484 (0.86 - 0.00)² = 0.7396 (0.38 - 0.00)² = 0.1444 (0.92 - 0.00)² = 0.8464 (0.75 - 0.00)² = 0.5625 (0.12 - 0.00)² = 0.0144 (0.66 - 1.00)² = 0.1156 (0.88 - 0.00)² = 0.7744 (0.43 - 0.00)² = 0.1849 (0.15 - 0.00)² = 0.0225

0.0484 + 0.7396 + 0.1444 + 0.8464 + 0.5625 + 0.0144 + 0.1156 + 0.7744 + 0.1849 + 0.0225 = 3.5387 Single sample cost

Functions

Neural Network Function

Input: 784 numbers (pixels values) Parameters: 13.002 numbers (weights and biases) Output: 10 numbers (digits)

Cost Function

Input: 13.002 numbers (weights and biases) Parameters: tens of thousands of training samples (pixels values) Output: 1 number (cost)

Function minima

C(w)

Function minima

C(w)

Function minima

C(w)

Gradient descent

-∇ C(w)

Recap

A neural network is a function with many parameters (weights and biases) wheras in the cost function weights and biases are the inputs and the parameters are the training samples.
The cost function is a function that takes the network's parameters as inputs and returns a single number, the cost. The lower the cost, the better the network is performing.
The gradient descent algorithm is a way to find the parameters that minimize the cost function.
The algorithm that efficiently computes this gradient and forms the core of how a neural network learns is known as backpropagation.

Doubts or thoughts?

Backpropagation

Backpropagation

0.66 = σ( w₁a₁ + ... + w_na_n + b )

Increase the bias: b
Increase the weights: w_i
Increase the activation of previous layer: a_i

Keep propagating

a = σ( w₁a₁ + ... + w_na_n + b )

Increase the bias - b
Increase the weights - w_i
Keep propagating

Tweak knobs and dials

Recap

Training a neural network is a process of adjusting its parameters (weights and biases) minimizing the cost function.
A properly labeled dataset is crucial for training a neural network, but the network will know only about the training samples.
Starting with random weights and biases, the network iteratively adjusts them based on the desired output of the training samples.

Doubts or thoughts?

A bit of history

1943 - Warren McCulloch and Walter Pitts conceptualized the first neural network comparing the human brain to an electrical circuit.
1958 - Frank Rosenblatt invented the perceptron, a type of neural network with one layer that could learn to recognize patterns in data.
1980 - The first neural network with few multiple layers was developed by David Rumelhart, Geoffrey Hinton, and Ronald Williams.
1986 - Backpropagation was developed by Rumelhart, Hinton, and Williams, a technique for training neural networks with multiple layers.
So called AI winter (1974-1980) - funding cuts, lack of progress, and skepticism about the future of AI because of the computational complexity.
1997 - The first Long Short-Term Memory (LSTM) neural network was developed by Sepp Hochreiter and Jürgen Schmidhuber. Capable of learning long-term dependencies and well suited for sequence prediction tasks.
1999 - GPUs becomes the perfect fit to execute matrix operations required for neural networks.
2012 - AlexNet wins the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) a type of convolutional neural network (CNN) image recognition challenge.
2017 - Attention is all you need (Google paper) introduced the transformer architecture, a type of neural network that can process sequence data in parallel. This architecture is the foundation of modern LLMs.

It's been 82 years...

Thank you!

Thanks to 3Blue1Brown

w_0,0	w_0,1	...	w_0,n
w_1,0	w_1,1	...	w_1,n
⋮	⋱	⋱	⋮
w_k,0	w_k,1	...	w_k,n