FIT5167 – Natural Computation Week 11 + Review

Let post as I forgot to publish,  the last week of new topics in Natural computation covered Recurrent networks for time series forecasting. The alternatives for structuring and feeding back previous time series are the main points of difference between methodologies.

Elman Networks:

elman network
source: Week 11 lecture notes FIT5167

Jordan Networks:

jordan networks
source: Week 11 lecture notes FIT5167

Fully recurrent:

Fully Recurrent Time series forcasting network
source: Week 11 Lecture notes FIT5167

These network operate very similarly to standard MultiLayer perceptrons. Self organizing maps have been proposed as one possible method for selecting input variables. Genetic algorithms were also noted as an alternative input selector.

Review of this unit:

I found the FIT5167 to be a very thought provoking subject, with excellent resource provided by the subject lecturer, Grace Rumantir. The best part of the subject was the assignments where we got some very useful practical experience  constructing neural networks. With the statistical analysis that NNs allow, the skills learned in the subject can be applied to a very wide range of problems. I would recommend this subject to anyone studying MIT at Monash even if their major is not Intelligent Systems.

FIT5167 – Natural Computation Week 10

Time series forecasting was the topic of week 10’s lecture. To complete time series forecasting we first need to remove anything that is easy to forecast from the data:

  • Trends
  • Cycles (Cyclical Components)
  • Seasonal variations
  • ‘Irregular’ the hardest to predict and the component which our neural networks will be attempting to forecast.

Autocorrelation generally stronger for recent data items and degrades in quality as we step back through the time series. Autocorrelation uses past data items in an attempt to predict n time steps into the future. It must be noted that error incurred in lesser timesteps forward are more than likely to grow as the prediction continues to step forward. Although this point seems obvious when viewing data predictions that seem intuativley correct our own confirmation bias often outweighs the awareness of a models limitations.

 

autocorrellationUsingMLP
Autocorrelation using MLP

Spatio-Temporal models incorporate a principal component. This is a variable/s who’s influence on future timesteps is significant. An example in our water use prediction would be the rainfall of previous months. Low rainfall would suggest higher water usage. Their are many methods for identifying Principal components, Karl Pearson invented this field with the introduction of his Pearson Product-moment analysis.

Forecasting linear time series can be conducted using a Single layer perceptron. It may however be questionable as to how much this tool would be superior as opposed to more simplistic modelling methods. Auto-regressive with external variables [Arx] models utilize both previous time series data and principal component states for generating forecasts.

Evaluating model accuracy can be done in rudamentary fashion using root mean square error [RMSE].

Moving past the simplisting Single layer networks we review time lagged feed forward networks:

 

timeLaggedFeedFor
Time Lagged feed forward networks (Non-Linear)

We then moved to Non-Linear Auto-regressive with external variable [NArx] networks:

 

Narx1
A recurrent NArx network

The same training principles as with standard NNs applies to time series forecasting. Importantly the training data must be viewed in chronological order as forcasting would suggest in contrast to classification.

 

nArxtraining
Minimize RMSE on validation, not training data!

Again, awareness must be given to over/under fitting. Minimizing RMSE on training data does not infer an accurate model for all/future data.

 

 

 

FIT5167 – Natural Computation Week 9

Natural computation’s 9th week saw an introduction to associative memory networks.

 

Two major types of associative networks

Initialization of an Bi-Directional Associative memory network involves establishing a weight matrix using input and output pairs:

BAM initialization

It seems much easier to write a simple script which demonstrates understanding of the weight initialization and memory recall algorithms. Hopefully I can do that this week. The major question that comes to mind after this lecture was why these networks would be used instead of a SOM.

FIT5167 – Natural Computation Week 8

With a unit test and assignment this week, there were no new topics introduced. The research I had to do for SOMs in relation to the assignment did yield a lot of new information.

Implementation of SOM networks has a number variable components:

  • Neighbour updating, neighbour radius
  • Weight decay
  • Random weight initialization (the random weights at initialization will effect clusters)
  • Adjusting learning rate and learning rate decay
  • Adjusting training/test data split
  • Adjusting number of neurons in 2D lattice

Evaluation of the quality of clusters created by a SOM is quite difficult. Weight distances is the best method for checking if like clusters have formed in different sections of the map. Running some clustering in MatLab yielded basic results but I am not familiar enough with the clustering tool to extrapolate all of of the information required to make inference on the results.

Initial results from clustering tool in Matlab on Banding data

The information gained from self-organizing maps may be useful when constructing supervised learning networks.

FIT5167 – Natural Computation Week 7

Week 7 introduced Genetic Algorithms, who’s effectiveness is somewhat disputed. In any case, these algorithms are quite interesting in their balance between a kind of hill climbing (fitness function) and stochastic methods (cross over, mutation).

The lecture gave the natural basis for these algorithms and defined the key components:

  • Chromosome (ie 101101101)
  • Reproduction (ie crossover/roulette/tournament)
  • Mutation
  • Fitness functions

GAvHillCimb
Genetic Algorithms can find good solutions in large search spaces quickly

The second half of the lecture was dedicated to assignment and unit test revision.

FIT5167 – Natural Computation Week 6

Natural computation entered week 6 with an introduction to unsupervised learning. That is, learning in a neural network without a target for output. This is generally achieved through classification/clustering/self organising maps [SOM].

self organising map
self organising map

The networks for SOMs are actually a little bit simpler than MLP. The process for creating clusters is also quite intuitive. Each neuron in the feature map layer has a unique weight vector, if an input results in that neuron being the most activated (which neuron has the lowest euclidean distance from the input vector) then its weight values move closer to that of the input ( again using euclidian distance):

 

SOM weight update (source: week 6 notes)

The concept of decaying the learning rate was introduced during the lecture but this must be done carefully. If one were to train a network until the weight adjustments stabilized, training will end after a certain number of epochs regardless of how well the network has clustered the data.

Finally the concepts of ‘topological neighborhood’ was introduced. In actual brains, weights of neighboring neurons are updated when a neuron wins the competitive activation. Logically this will result in similar classifications being held by neighboring neurons. The update of the neighboring weight can be done using Gaussian or exponential decay functions:

 

Update neighboring neurons too!
Update neighboring neurons too! (source week 6 notes)

FIT5167 – Natural Computation Week 5

Part 2 of the MLP lectures was completed in week 5. We ran through some extended examples including Batch and Online learning methods. The issue of local minimums and over fitting were also introduced along with some ways of overcoming the limitations they impose.

It turns out that batch learning is the most common method of learning. We ran through an example of proportionality using Mean Square Error [MSE] then a further example applying momentum.

batchlearning
The crux of batch learning

The concept and reasoning behind each operation in back-propagation and batch learning are quite clear, I definitely need to do some repetition to memorize the process for an exam condition however.

The next topic was network generalization, whereby the fitting of the model is relaxed. This ensures that noise and sample data patterns do not have a negative impact on the ability of a NN generated model to reflect further values.

Generalization is required for effective modelling

Other method for preventing over fitting, thrashing and intractable learning were:

  • Early stopping [set number of epochs]
  • Regularization/weight decay
  • Data normalization
  • More that will be covered in next weeks lecture

The tutorial enabled us to start using matlab. The nprtool and nntool were used to create neural networks which could then be exported and manual modified to specific requirements. I found matlab to be fairly easy to use, with exception for the plotting tools when I was unable to make what I wanted with.

 

FIT5167 – Natural Computation Week 4

Natural computation, week number 4 -> Multilayer perceptron [MLP] for non-linear data analysis.

MLP is one of several neural networks for non-linear modelling but is the most popular, hence our focus on it.

MLP’s are very popular because they can model any non-linear pattern. Everything sounds great if we compare an MLP network to the single layer networks we discussed earlier. However, the learning of the MLP network is quite a bit more complex do to the distorted relationship between hidden layers and the output error. Also discussed previously, neural network learn [supervised learning] by adjusting the weights applied to inputs to neurons. The weight adjust must be connected to the output error, the only way to back propagate is through differentiation.

 

We need to relate w1...wn to the output error

At this point it is worth noting that the activation function for MLP neurons must be continuous so as to enable backward chaining differentiation.

 

See the notation for this backward chaining example

Now we need to find the error gradient with respect to b(output neuron weights) and a(hidden neuron weights). After conduction the first round of differentiation:

diff1
first round

Now for the hidden layer:

diff2
completion of backward chaining to hidden layer

In the tutorial following we completed this process in excel to see the learning process.

I will be uploading a copy of this once I confirm it is correct.

 

FIT5167 – Natural Computation Week 3

Week 3 of natural computation continued our step by step unraveling of the perceptron. We dealt with the case of classification and supervised training of a single perceptron. Although the concepts and logic are quite straight forward, there was some odd math operations that we spent a lot of our time on. I am not sure why we spent so much time on drawing the discriminant of a perceptron. I can’t see how it could be a useful skill with the exception of when we need to hand draw boundaries in the exam (and learning something just to pass an exam seems nonsensical). Anyway, the tutorial was particularly good in that we did some practical calculations in excel that were closely correlated to what we learnt in the lecture.

Specifically, the process of training a perceptron was emulated. Seeing exactly how altering the Beta value changed the learning process for a perceptron was valuable, along with understanding some of the possible inefficiencies/intractabilities associated with the simple, single perception network.

I am really looking forward to when we can see how an MLP handles a large dataset, sometimes having a vision of the goal makes these simple steps much more understandable.

source week 3 lecture notes

FIT5167 – Natural Computation Week 2

Natural computation’s second week made me feel quite a bit more comfortable with the subject as to the first week.  We discussed artificial neural networks in more specific and relevant detail. Starting of were the main components:

  • Inputs (X1, X2, X3…Xn)
  • Weights(Wj1,  Wj2, Wj3…Wjn)
  • Bias (Bj)
  • Neuron (summation function)
  • Transfer/Transformation/Activation function
    • Threshhold/McCulloch-Pitts function [0,1]
    • Sigmoid function {Yj = 1/(1 + e^(-AUj))}
  • Output

Next came an explanation of the 3 fundamental neural network architectures:

  • Single Layer Feedforward – see image below

 

neural_net
simple feed forward neural network

(more…)