Categories

# Week 3 – Introduction to R. Cont’

Coming back to R after closing, a session can be restored by simply running R in the workspace directory.

A history file can be specified via:

```# recall your command history

RData can also be saved and loaded via:

```# save the workspace to the file .RData in the cwd
save.image()

# save specific objects to a file
# if you don't specify the path, the cwd is assumed
save(object list,file="myfile.RData")
# load a workspace into the current session
# if you don't specify the path, the cwd is assumed

Describing data:

```# show data files attached
ls()
# show dimensions of  a data object 'd'
dim(d)
#show structure of data object 'd'
str(d)
#summary of data 'd'
summary(d)```

Subsets of data is a logical next step:

`summary(subset(d, read <= 60))`

Grouping data is also fairly intuitive:

```by(d[, 7:11], d\$prog, colMeans)
by(d[, 7:11], d\$prog, summary)```

Using histograms to plot variable distributions:

```ggplot(d, aes(x = write)) + geom_histogram()
# Or kernel density plots
ggplot(d, aes(x = write)) + geom_density()
# Or boxplots showing the median, lower and upper quartiles and the full range
ggplot(d, aes(x = 1, y = math)) + geom_boxplot()```

Lets look at some more ways to understand the data set:

```# density plots by program type
ggplot(d, aes(x = write)) + geom_density() + facet_wrap(~prog)
# box plot of math scores for each teaching program
ggplot(d, aes(x = factor(prog), y = math)) + geom_boxplot()```

Extending visualizations:

```ggplot(melt(d[, 7:11]), aes(x = variable, y = value)) + geom_boxplot()
# break down by program:
ggplot(melt(d[, 6:11], id.vars = "prog"), aes(x = variable, y = value, fill = factor(prog))) +  geom_boxplot()```

Analysis of categories can be conducted with frequency tables:

```xtabs(~female, data = d)
xtabs(~race, data = d)
xtabs(~prog, data = d)
xtabs(~ses + schtyp, data = d)```

Finally lets have a look at some bivatiate (pairwise) correlations. If ther is no missing data, cor function can be users, else use can remove items:

```cor(d[, 7:11])
ggpairs(d[, 7:11])```