Coming back to R after closing, a session can be restored by simply running R in the workspace directory.
A history file can be specified via:
# recall your command history loadhistory(file="myfile") # default is ".Rhistory"
RData can also be saved and loaded via:
# save the workspace to the file .RData in the cwd save.image() # save specific objects to a file # if you don't specify the path, the cwd is assumed save(object list,file="myfile.RData") # load a workspace into the current session # if you don't specify the path, the cwd is assumed load("myfile.RData")
Describing data:
# show data files attached ls() # show dimensions of a data object 'd' dim(d) #show structure of data object 'd' str(d) #summary of data 'd' summary(d)
Subsets of data is a logical next step:
summary(subset(d, read <= 60))
Grouping data is also fairly intuitive:
by(d[, 7:11], d$prog, colMeans) by(d[, 7:11], d$prog, summary)
Using histograms to plot variable distributions:
ggplot(d, aes(x = write)) + geom_histogram() # Or kernel density plots ggplot(d, aes(x = write)) + geom_density() # Or boxplots showing the median, lower and upper quartiles and the full range ggplot(d, aes(x = 1, y = math)) + geom_boxplot()
Lets look at some more ways to understand the data set:
# density plots by program type ggplot(d, aes(x = write)) + geom_density() + facet_wrap(~prog) # box plot of math scores for each teaching program ggplot(d, aes(x = factor(prog), y = math)) + geom_boxplot()
Extending visualizations:
ggplot(melt(d[, 7:11]), aes(x = variable, y = value)) + geom_boxplot() # break down by program: ggplot(melt(d[, 6:11], id.vars = "prog"), aes(x = variable, y = value, fill = factor(prog))) + geom_boxplot()
Analysis of categories can be conducted with frequency tables:
xtabs(~female, data = d) xtabs(~race, data = d) xtabs(~prog, data = d) xtabs(~ses + schtyp, data = d)
Finally lets have a look at some bivatiate (pairwise) correlations. If ther is no missing data, cor function can be users, else use can remove items:
cor(d[, 7:11]) ggpairs(d[, 7:11])