The final lecture on quantitative data analysis covered4 specific statistical test:

Binomial – Given a weighted coin, how many heads will probably result from 30 tosses

Median – Checks that the medians of two populations are not significantly different

Mood’s median test – Checks for significant similarity between unrelated samples (non-parametric)

Kilmogorov-Smirnov – Measure the cumulative difference between data, are the data sets different?

Friedman – Testing for significant differences across testing intervals on a sample population

The lecture slides included clear examples of these tests. The tutorial followed up with some practical examples using SPSS. After the 4 weeks of quantitative data analysis we now have a decent toolbox specifically for non-parametric data analysis. Our assignment requires application of these tools. I imagine that the assignment will give lease to some of the ambiguities that arise when reasoning from quantitative analysis.

Probability, hypothesis testing and regression analysis continued the topic of quantitative analysis in week 8. Our discussion on the statistic techniques that we are using with the SPSS package focuses on the interpretation of outputs rather than the mathematics behind them. This seems reasonable given the limited time we have assigned to such a large area.

The first points covered were definitions of probability:

Marginal (simple) probability – rolling 3 six in a row with a standard dice => (1/6) x (1/6) x (1/6)

Joint probability P(AB) => P(A) x P(B)

Conditional Probability – I would stick with Bayes theorem => see below

Binomial Distribution – probability of a number times and event occurs given a true or false outcome and n trials. ie: how many times will head appear in 20 tosses of a coin.

Normal (Gaussian) distribution – Requires continuous random variables (ie age), see below

Hypothesis testing and Regression analysis followed. The recurring theme is the significance value of less then 0.05 required for hypothesis support.

SPSS seems like a great tool for statistical analysis with all of the statistic methods widely used and relatively simple use.

A short week for IT research methods in terms of new material. Due to the literature review presentations we did not have a tutorial and only half a lecture. The topic of the lecture was ‘Correlation Analysis’, presented by Joze Kuzic.

Lets start with the simple definition of correlation analysis, ‘A statistical investigation of the relationship between one factor and one or more other factors’.

Correlation – both variables are random variables, and 2) the end goal is simply to find a number that expresses the relation between the variables
Regression – one of the variables is a fixed variable, and 2) the end goal is use the measure of relation to predict values of the random variable based on values of the fixed variable

The topic of causality and correlation was approached quite carefully in the lecture notes citing that correlation can be used to look for causality but does not infer causality.

Methods of correlations:

Pearson’s correlation coefficient – for parametric (randomized, normally distributed data).

Spearman rank order correlation coefficient – for non-parametric data, [-1.0 , 1.0]

Significance of correlations was the next logical point covered, not much mathematical reasoning was covered apart from p < 0.05 is good :).

Week 6 began statistical analysis using SPSS, specifically for non-parametric tests. Non-parametric data can be described as data that does not conform to normal distribution. A simple example is ranked data such as movie reviews (0 – 5 stars). A major limitation of non-parametric data is the increased sample size required to gain sufficient significance to reject a null hypothesis.

Rank, Score, or Measurement (from Non- Gaussian Population)

Binomial
(Two Possible Outcomes)

Survival Time

Describe one group

Mean, SD

Median, interquartile range

Proportion

Kaplan Meier survival curve

Compare one group to a hypothetical value

One-sample t test

Wilcoxon test

Chi-square
or
Binomial test **

Compare two unpairedgroups

Unpaired t test

Mann-Whitney test

Fisher’s test
(chi-square for large samples)

Log-rank test or Mantel-Haenszel*

Compare two paired groups

Paired t test

Wilcoxon test

McNemar’s test

Conditional proportional hazards regression*

Compare three or more unmatched groups

One-way ANOVA

Kruskal-Wallis test

Chi-square test

Cox proportional hazard regression**

Compare three or more matched groups

Repeated-measures ANOVA

Friedman test

Cochrane Q**

Conditional proportional hazards regression**

Quantify association between two variables

Pearson correlation

Spearman correlation

Contingency coefficients**

Predict value from another measured variable

Simple linear regression
or
Nonlinear regression

Nonparametric regression**

Simple logistic regression*

Cox proportional hazard regression*

Predict value from several measured or binomial variables

Multiple linear regression*
or
Multiple nonlinear regression**

Multiple logistic regression*

Cox proportional hazard regression*

All of the tests described in the table above can be applied via SPSS. Note that “Gaussian population” refers to normally distributed data. Not featured in the table above is the sign test, perhaps as it is described as lacking statistical power of paired t-tests or the Wilcoxon test.

One question that immediately comes to mind is how the process of normalization can be applied to force comparison of normally distributed data to non-parameter data.

The lecture went on to describe important assumptions and the rationale behind several test methods. I will await further practical testing with SPSS before going into more detail on them.

The topic of week 5’s lecture presented by David Arnott was ‘Communicating Research’. After establishing why it is important to publish research, we cover the paper publication process in some detail.

The first step discussed was the research proposal, aimed at the target audience of supervisors/scholarship committee/confirmation panel. In regards to tense it was advised to write in past tense with the exception of results discussion which would be written in present tense. Proof reading and polishing were highlighted as a key characteristic of successful paper.

Referencing came next, including introduction to the author date and numbered referencing.

Planning on both a paper level and a macro level for a research career where highlighted by David as a key factor for success.

IT research method’s fourth week was presented by Joze Kuzic providing a detailed introduction to surveys (or ‘super looks’ as the translation demands). First off we clarified that surveys are not limited to forms that managers and students need to fill out! There are many types of surveys, ie:

Statistical

Geographic

Earth Sciences

Construction

Deviation

Archaeological

Astronomical

These are just a few types of non-form surveys. So with this broader view we can see that most anyone conducting research will need to have a good understanding of how to create effective surveys. Interviews were listed as a method for conducting surveys although I imagine this would in most cases be quite dubious if used alone. Anonymous surveys appear to be the most common form of surveys for people.

After discussing some of the obvious pros and cons of mail surveys, the lecture moved into population sampling.

Experiments was the topic of week 3’s lecture presented by David Arnott. We started with a classification of scientific investigation:

Descriptive studies

Correlation studies

Experiments

Importantly the anchor of these investigations is the research question.

Terms and concepts was the next sub-section:

Subject (Participant by law in Aus where people are subjects) – The target of your experimentation

Variables (Independent variables, Dependent variables, Intermediate variables, Extraneous variables), these are self explanatory via dictionary definitions.

Variance/Factor models – Aims to predict outcome from adjustment of predictor (independent?) variables, in an atomic time frame. That is my loose interpretation.

Process model -Aims to explain how outcomes develop over time (The difference between variance and process models appears to be moot and I feel somewhat irrelevant).

Groups -> experimentation group, control group -> ensuring group equivalence.

Hypothesis – Prediction about the effect of independent variable manipulation on dependent variables. One tailed, two tailed, null hypothesis.

Significance – the difference between two descriptive statistics, to an extend which cannot be chance.

Reliability – Can the research method be replicated by another researcher

Internal Validity – How much is the manipulation of the independent variable responsible for the results in the dependent variable.

External validity – Can the results be generalized to entities outside of the experiment

Construct validity – extend to which the measures used in the experiment actually measure the construct?

Experimental Design followed:

Between-subject design vs Within-subject design -> are subjects manipulated in the same or differing ways.

After-only vs Before-after design -> testing of dependent variables at which stages..

Statistical tests must reflect the experimental design:

When creating an experimental design it seems like a good idea just to make a check list.

The coffee/caffeine example covered next seemed a bit odd as it made the assumption that coffee caffeine are the same things. I recall same type assumption was made in regards to THC and marijuana which was later found to be fundamentally flawed. I did not understand the Decision support system example at all so was not really able to extrapolate much understanding from the two examples covered.

Unfortunately I was absent for week 2 of IT Research Methods and the lecture delivered by Prof. David Arnott. The lecture was focussed on the initial stages to any research project, literature review.

Thematic Analysis – Qualitative in nature, classifying papers according to themes that are relevant to your research project.

Bibliographic Analysis – Quantitative in nature, using citation and/or content analysis. (rarely used in IT research)

A question posed at the start of the lecture; what is scientific evidence? Journal and conference papers along with websites, blogs, book and trade magazines were listed as possibilities. Before reading through the lecture I feel that any of these mediums could qualify as scientific evidence. Peer reviewed academics articles would however present a much more filtered source with blogs and websites most likely containing much more refutable contentions. It seems unwise to completely discount a source of information purely on the ground that it is a blog or website though.

The notes go on to present a rating system for journals, A, B and C, the A listers being:

Decision Support Systems

European Journal of Information Systems

Information and Management

Information Systems Journal

Information Systems Research

Journal of Information Technology

Journal of Management Information Systems

Journal of the Association for Information Systems

MIS Quarterly

The aim of a literature review can be summarized as:

Synthesis of articles

Define and understand relevant controversies

Based on critical review (note notes or observations)

Reads like an essay (but can use tables)

It seems that the thematic method of literature review is the avenue we will be encouraged to follow, which seems quite reasonable. Thematic review can be author and/or topic centric. Author centric review would only be appropriate in very limited niche topics where the published articles are by a limited number of researchers. When taking on topic centric review, creating a table with concept categorization for articles is recommended:

Some questions are presented at the close of the lecture (which I imagine were answered in the lecture):

How long should a lit review be?

How many papers should be reviewed?

What tense should be used?

Which citation methodology? APA/Harvard?

I will have to follow up on these in the coming tutorial.

Finally there was a youtube video listed in the review materials for the week which included some good points:

What is the purpose of a literature review?

Summarized what has been researched before

Highlights the research gaps that you will aim to fill

Why it is necessary to fill those gaps

Set the scope of your research

Scope and length? – Does it need to be everything you know? No, the current state of the theory. Length requires discussion wit supervisor, but consider this is a summary of current research. Summary of existing knowledge, review of current research.
Look for flaws, disagreement among researchers.

Sources – Refereed international journals, Books/Chapters, national journals, conference papers, non-refereed articles.

Review of instruments – What are you using to gather data to support your hypothesis, are they an acceptable source, why?

Week 1 of IT research methods was a lecture by Dr Jose Kuzic on the nature of research. The lecture bounced between subjective opinions from experience in research and a a framework for conducting research questions.

Formulating Questions

Literature Analysis

Case Studies

Surveys

Qualitative data analysis

Quantitative data analysis

Communication research

Also introduced were some research paradigms:

Scientific research (positivist)

Applied research (practical)

Social research (interpretive)

I feel that being aware of these paradigms is valuable but self imposing mutual exclusivity or black and white generalization would be counter productive (ie: oh well that’s just a positivist view/ I can’t do that I am doing applied research). A more pragmatic approach of using whatever the best method for reaching outcomes to a posed question regardless of paradigm would be required for good research.

Details of Assignment 1 and 2 were also made available on moodle this week. Assignment 1, a literature review and presentation seems like it will be an enjoyable assignment that will allow some synergy with other subjects.