May 30, 2011 – Mark's IT Blog

FIT5047 – Intelligent Systems Week 10

Week 10 moved on from classification to clustering. Although, conceptually, there was close relation to topics covered in Natural Computation the methods discussed were new. Again, Euclidean distance is a fundamental measure of similarity/uniquness.

The first method introduced was Heirarchical Clustering. This introduction was very bried and reference to the text would need to be made for issues such as linkages.

The next method was K-Means clustering.

I find the limitation of assuming the number of clusters [K] to go close to invalidating this methodology in its basic form. Of course, however the algorithm can be extended to an exhaustive or stoichastic search were multiple K values are compared and contrasted. The idea of clustering is to simplify data sets, in essence reducing dimensianality. With this in mind there must be a penalty for extended K-means algorithms for the number of clusters. Otherwise the best clustering would always result in K = number of unique instances. MML, MDL and BIC are examples of algoriths that incorporate these penalities. Interestingly, I came across MDL when looking for effective method for discretizing continuous variables. It now seems obvious that discretization is a form of clustering where there need to be penalties for an increasing number of clusters. For more info on using MDL to discretize continuos variables see:

Fayyad, U., Irani, K., 1993, Multi-interval discretization of continuous valued attributes for
classification learning, Thirteenth International Joint Conference on Articial Intelligence, 1022-
1027

Interstingly Usama Fayyad is now Chief Data Officer and Executive Vice President, Yahoo! Inc… for next time anyone says research in this field is pointless for a career.

The lecture continued to introduce issues and algorithms which require a great deal of reading and writing to do justice (which I am yet to complete).

Chief Data Officer and Executive Vice President, Yahoo! Inc.

FIT5167 – Natural Computation Week 10

Time series forecasting was the topic of week 10’s lecture. To complete time series forecasting we first need to remove anything that is easy to forecast from the data:

Trends
Cycles (Cyclical Components)
Seasonal variations
‘Irregular’ the hardest to predict and the component which our neural networks will be attempting to forecast.

Autocorrelation generally stronger for recent data items and degrades in quality as we step back through the time series. Autocorrelation uses past data items in an attempt to predict n time steps into the future. It must be noted that error incurred in lesser timesteps forward are more than likely to grow as the prediction continues to step forward. Although this point seems obvious when viewing data predictions that seem intuativley correct our own confirmation bias often outweighs the awareness of a models limitations.

: Autocorrelation using MLP

Spatio-Temporal models incorporate a principal component. This is a variable/s who’s influence on future timesteps is significant. An example in our water use prediction would be the rainfall of previous months. Low rainfall would suggest higher water usage. Their are many methods for identifying Principal components, Karl Pearson invented this field with the introduction of his Pearson Product-moment analysis.

Forecasting linear time series can be conducted using a Single layer perceptron. It may however be questionable as to how much this tool would be superior as opposed to more simplistic modelling methods. Auto-regressive with external variables [Arx] models utilize both previous time series data and principal component states for generating forecasts.

Evaluating model accuracy can be done in rudamentary fashion using root mean square error [RMSE].

Moving past the simplisting Single layer networks we review time lagged feed forward networks:

: Time Lagged feed forward networks (Non-Linear)

We then moved to Non-Linear Auto-regressive with external variable [NArx] networks:

: A recurrent NArx network

The same training principles as with standard NNs applies to time series forecasting. Importantly the training data must be viewed in chronological order as forcasting would suggest in contrast to classification.

: Minimize RMSE on validation, not training data!

Again, awareness must be given to over/under fitting. Minimizing RMSE on training data does not infer an accurate model for all/future data.