Why use R?

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. – http://cran.csiro.au/doc/manuals/r-release/R-intro.html

This is a valid question considering that most languages/frameworks, including CUDA have statistical analysis libraries built in. Hopefully running through some introductory exercises will reveal the benefits.

Associated GUI’s and extensions:

  • Weka – Specific for machine learning algorithms
  • R Commander – Data analysis GUI

Install on Ubuntu 12.04:

  1. Then to enter the R command line interface, $ R

    For starters, will run through an intro from UCLA: http://www.ats.ucla.edu/stat/r/seminars/intro.htm

    Within the R command line interface if a package is to be used it must first be installed:

    • foreign – package to read data files from other stats packages
    • xlsx – package (requires Java to be installed, same architecture as your R version, also the rJava package and xlsxjars package)
    • reshape2 – package to easily melt data to long form
    • ggplot2 – package for elegant data visualization using the Grammar of Graphics
    • GGally – package for scatter plot matrices
    • vcd – package for visualizing and analyzing categorical data


    Preparing session:

    After installing R and the packages needed for a task if these packages are needed in the current session they must be included:

    After attaching all of the required packages to the current session, confirmation can be completed via:

    R code can be entered into the command line directly or saved to a script which can be run inside a session using the ‘source’ function.

    Help can be attained using ? preceding a function name.

    Entering Data:

    R is most compatible with datasets stored as text files, ie: csv.

    Base R contains functions read.table and read.csv see the help files on these functions for many options.

    Datasets from other statistical analysis software can be imported using the foreign package:

    If converting excel spreadsheets to CSV is too much of a hassle the xlxs package we imported will do the job:

    Viewing Data:

    Datasets that have been read in are stored as data frames which have a matrix structure. The most common method of indexing is object[row,column] but many others are available.

    Variables can also be accessed via their names:

    The c function is used to combine values of common type together to form a vector:

    Creating colnames:

    Saving data: