The final lecture on quantitative data analysis covered4 specific statistical test:
Binomial – Given a weighted coin, how many heads will probably result from 30 tosses
Median – Checks that the medians of two populations are not significantly different
Mood’s median test – Checks for significant similarity between unrelated samples (non-parametric)
Kilmogorov-Smirnov – Measure the cumulative difference between data, are the data sets different?
Friedman – Testing for significant differences across testing intervals on a sample population
The lecture slides included clear examples of these tests. The tutorial followed up with some practical examples using SPSS. After the 4 weeks of quantitative data analysis we now have a decent toolbox specifically for non-parametric data analysis. Our assignment requires application of these tools. I imagine that the assignment will give lease to some of the ambiguities that arise when reasoning from quantitative analysis.
Probability, hypothesis testing and regression analysis continued the topic of quantitative analysis in week 8. Our discussion on the statistic techniques that we are using with the SPSS package focuses on the interpretation of outputs rather than the mathematics behind them. This seems reasonable given the limited time we have assigned to such a large area.
The first points covered were definitions of probability:
Marginal (simple) probability – rolling 3 six in a row with a standard dice => (1/6) x (1/6) x (1/6)
Joint probability P(AB) => P(A) x P(B)
Conditional Probability – I would stick with Bayes theorem => see below
Binomial Distribution – probability of a number times and event occurs given a true or false outcome and n trials. ie: how many times will head appear in 20 tosses of a coin.
Normal (Gaussian) distribution – Requires continuous random variables (ie age), see below
Hypothesis testing and Regression analysis followed. The recurring theme is the significance value of less then 0.05 required for hypothesis support.
SPSS seems like a great tool for statistical analysis with all of the statistic methods widely used and relatively simple use.
A short week for IT research methods in terms of new material. Due to the literature review presentations we did not have a tutorial and only half a lecture. The topic of the lecture was ‘Correlation Analysis’, presented by Joze Kuzic.
Lets start with the simple definition of correlation analysis, ‘A statistical investigation of the relationship between one factor and one or more other factors’.
Correlation – both variables are random variables, and 2) the end goal is simply to find a number that expresses the relation between the variables
Regression – one of the variables is a fixed variable, and 2) the end goal is use the measure of relation to predict values of the random variable based on values of the fixed variable
The topic of causality and correlation was approached quite carefully in the lecture notes citing that correlation can be used to look for causality but does not infer causality.
Methods of correlations:
Pearson’s correlation coefficient – for parametric (randomized, normally distributed data).
Spearman rank order correlation coefficient – for non-parametric data, [-1.0 , 1.0]
Significance of correlations was the next logical point covered, not much mathematical reasoning was covered apart from p < 0.05 is good :).
Week 6 began statistical analysis using SPSS, specifically for non-parametric tests. Non-parametric data can be described as data that does not conform to normal distribution. A simple example is ranked data such as movie reviews (0 – 5 stars). A major limitation of non-parametric data is the increased sample size required to gain sufficient significance to reject a null hypothesis.
Rank, Score, or Measurement (from Non- Gaussian Population)
(Two Possible Outcomes)
Describe one group
Median, interquartile range
Kaplan Meier survival curve
Compare one group to a hypothetical value
One-sample t test
Binomial test **
Compare two unpairedgroups
Unpaired t test
(chi-square for large samples)
Log-rank test or Mantel-Haenszel*
Compare two paired groups
Paired t test
Conditional proportional hazards regression*
Compare three or more unmatched groups
Cox proportional hazard regression**
Compare three or more matched groups
Conditional proportional hazards regression**
Quantify association between two variables
Predict value from another measured variable
Simple linear regression
Simple logistic regression*
Cox proportional hazard regression*
Predict value from several measured or binomial variables
Multiple linear regression*
Multiple nonlinear regression**
Multiple logistic regression*
Cox proportional hazard regression*
All of the tests described in the table above can be applied via SPSS. Note that “Gaussian population” refers to normally distributed data. Not featured in the table above is the sign test, perhaps as it is described as lacking statistical power of paired t-tests or the Wilcoxon test.
One question that immediately comes to mind is how the process of normalization can be applied to force comparison of normally distributed data to non-parameter data.
The lecture went on to describe important assumptions and the rationale behind several test methods. I will await further practical testing with SPSS before going into more detail on them.
The topic of week 5’s lecture presented by David Arnott was ‘Communicating Research’. After establishing why it is important to publish research, we cover the paper publication process in some detail.
The first step discussed was the research proposal, aimed at the target audience of supervisors/scholarship committee/confirmation panel. In regards to tense it was advised to write in past tense with the exception of results discussion which would be written in present tense. Proof reading and polishing were highlighted as a key characteristic of successful paper.
Referencing came next, including introduction to the author date and numbered referencing.
Planning on both a paper level and a macro level for a research career where highlighted by David as a key factor for success.
IT research method’s fourth week was presented by Joze Kuzic providing a detailed introduction to surveys (or ‘super looks’ as the translation demands). First off we clarified that surveys are not limited to forms that managers and students need to fill out! There are many types of surveys, ie:
These are just a few types of non-form surveys. So with this broader view we can see that most anyone conducting research will need to have a good understanding of how to create effective surveys. Interviews were listed as a method for conducting surveys although I imagine this would in most cases be quite dubious if used alone. Anonymous surveys appear to be the most common form of surveys for people.
After discussing some of the obvious pros and cons of mail surveys, the lecture moved into population sampling.
Experiments was the topic of week 3’s lecture presented by David Arnott. We started with a classification of scientific investigation:
Importantly the anchor of these investigations is the research question.
Terms and concepts was the next sub-section:
Subject (Participant by law in Aus where people are subjects) – The target of your experimentation
Variables (Independent variables, Dependent variables, Intermediate variables, Extraneous variables), these are self explanatory via dictionary definitions.
Variance/Factor models – Aims to predict outcome from adjustment of predictor (independent?) variables, in an atomic time frame. That is my loose interpretation.
Process model -Aims to explain how outcomes develop over time (The difference between variance and process models appears to be moot and I feel somewhat irrelevant).
Groups -> experimentation group, control group -> ensuring group equivalence.
Hypothesis – Prediction about the effect of independent variable manipulation on dependent variables. One tailed, two tailed, null hypothesis.
Significance – the difference between two descriptive statistics, to an extend which cannot be chance.
Reliability – Can the research method be replicated by another researcher
Internal Validity – How much is the manipulation of the independent variable responsible for the results in the dependent variable.
External validity – Can the results be generalized to entities outside of the experiment
Construct validity – extend to which the measures used in the experiment actually measure the construct?
Experimental Design followed:
Between-subject design vs Within-subject design -> are subjects manipulated in the same or differing ways.
After-only vs Before-after design -> testing of dependent variables at which stages..
Statistical tests must reflect the experimental design:
When creating an experimental design it seems like a good idea just to make a check list.
The coffee/caffeine example covered next seemed a bit odd as it made the assumption that coffee caffeine are the same things. I recall same type assumption was made in regards to THC and marijuana which was later found to be fundamentally flawed. I did not understand the Decision support system example at all so was not really able to extrapolate much understanding from the two examples covered.
Unfortunately I was absent for week 2 of IT Research Methods and the lecture delivered by Prof. David Arnott. The lecture was focussed on the initial stages to any research project, literature review.
Thematic Analysis – Qualitative in nature, classifying papers according to themes that are relevant to your research project.
Bibliographic Analysis – Quantitative in nature, using citation and/or content analysis. (rarely used in IT research)
A question posed at the start of the lecture; what is scientific evidence? Journal and conference papers along with websites, blogs, book and trade magazines were listed as possibilities. Before reading through the lecture I feel that any of these mediums could qualify as scientific evidence. Peer reviewed academics articles would however present a much more filtered source with blogs and websites most likely containing much more refutable contentions. It seems unwise to completely discount a source of information purely on the ground that it is a blog or website though.
The notes go on to present a rating system for journals, A, B and C, the A listers being:
Decision Support Systems
European Journal of Information Systems
Information and Management
Information Systems Journal
Information Systems Research
Journal of Information Technology
Journal of Management Information Systems
Journal of the Association for Information Systems
The aim of a literature review can be summarized as:
Synthesis of articles
Define and understand relevant controversies
Based on critical review (note notes or observations)
Reads like an essay (but can use tables)
It seems that the thematic method of literature review is the avenue we will be encouraged to follow, which seems quite reasonable. Thematic review can be author and/or topic centric. Author centric review would only be appropriate in very limited niche topics where the published articles are by a limited number of researchers. When taking on topic centric review, creating a table with concept categorization for articles is recommended:
Some questions are presented at the close of the lecture (which I imagine were answered in the lecture):
How long should a lit review be?
How many papers should be reviewed?
What tense should be used?
Which citation methodology? APA/Harvard?
I will have to follow up on these in the coming tutorial.
Finally there was a youtube video listed in the review materials for the week which included some good points:
What is the purpose of a literature review?
Summarized what has been researched before
Highlights the research gaps that you will aim to fill
Why it is necessary to fill those gaps
Set the scope of your research
Scope and length? – Does it need to be everything you know? No, the current state of the theory. Length requires discussion wit supervisor, but consider this is a summary of current research. Summary of existing knowledge, review of current research.
Look for flaws, disagreement among researchers.
Sources – Refereed international journals, Books/Chapters, national journals, conference papers, non-refereed articles.
Review of instruments – What are you using to gather data to support your hypothesis, are they an acceptable source, why?
Week 1 of IT research methods was a lecture by Dr Jose Kuzic on the nature of research. The lecture bounced between subjective opinions from experience in research and a a framework for conducting research questions.
Qualitative data analysis
Quantitative data analysis
Also introduced were some research paradigms:
Scientific research (positivist)
Applied research (practical)
Social research (interpretive)
I feel that being aware of these paradigms is valuable but self imposing mutual exclusivity or black and white generalization would be counter productive (ie: oh well that’s just a positivist view/ I can’t do that I am doing applied research). A more pragmatic approach of using whatever the best method for reaching outcomes to a posed question regardless of paradigm would be required for good research.
Details of Assignment 1 and 2 were also made available on moodle this week. Assignment 1, a literature review and presentation seems like it will be an enjoyable assignment that will allow some synergy with other subjects.