When you have quantitative data, you can analyze it using either descriptive or inferential statistics. Descriptive statistics do exactly what it sounds like – they describe the data. Descriptive statistics include measures of central tendency (mean, median, mode), measures of variation (standard deviation, variance), and relative position (quartiles, percentiles). There are times, however, when you want to draw conclusions about the data. This may include making comparisons across time, comparing different groups, or trying to make predictions based on data that has been collected. Inferential statistics are used when you want to move beyond simple description or characterization of your data and draw conclusions based on your data. There are several kinds of inferential statistics that you can calculate; here are a few of the more common types:
t-tests
A t-test is a statistical test that can be used to compare means. There are three basic types of t-tests: one-sample t-test, independent-samples t-test, and dependent-samples (or paired-samples) t-test. For all t-tests, you are simply looking at the difference between the means and dividing that difference by some measure of variation.
One-sample t-test
A one-sample t-test can be used to compare your data to the mean of some known population.
- Example: Suppose you are interested in knowing whether students who are utilizing the Career Services office are generally the students with higher GPAs. You would take the mean GPA of the students who use Career Services and compare it to the mean GPA of all students at the institution, taken from the registrar’s records.
- Thus, use a one-sample t-test when:
- You have one data set or one mean that you are interested in
- You know the mean of the population (the entire population, not a sample!) you wish to compare your mean to
Independent-samples t-test
An independent-samples t-test can be used to compare data from two separate, non-related samples.
- Example: Suppose you are interested in knowing how your institution compares to other institutions in terms of hours of community service per capita. You would take your students’ mean community service hours per person and compare it to other institutions’ mean community service hours per person.
- Example: Suppose you are interested in determining whether there is a difference between students in Greek organizations and students who are not in Greek organizations on a measure of satisfaction with weekend programming. You could issue a survey to students and then compare the mean satisfaction of Greeks with the mean satisfaction of non-Greeks.
- Thus, use an independent-samples t-test when:
- You have two separate, non-overlapping groups or data sets that you want to compare. That is, different people provided the data for each group.
Dependent samples t-test
A dependent-samples t-test can be used to compare data from related groups or the same people over time. This is most often used when you have a pretest/posttest setup.
- Example: You want to know whether students’ attitudes toward diversity changes from their freshman to senior years. You could ask incoming freshmen to indicate their level of agreement with various statements related to diversity and then administer the same survey to them again in their senior year and compare their answers.
- Thus, use a dependent-samples t-test when:
- You have two separate data sets that are provided by the same people, just at different times (e.g. pre/post)
For more information about t-tests, visit:
http://www.socialresearchmethods.net/kb/stat_t.php
ANOVA (Analysis of Variance)
An ANOVA is a statistical test that is also used to compare means. The difference between a t-test and an ANOVA is that a t-test can only compare two means at a time, whereas with an ANOVA, you can compare multiple means at the same time. ANOVAs also allow you to compare the effects of different factors on the same measure. ANOVAs can become very complicated, and the analysis should only be done by someone who has been trained in statistics. There are several types of ANOVAs, including: one-way ANOVA, within-groups (or repeated-measures) ANOVA, and factorial ANOVA.
One-way ANOVA
A one-way ANOVA is used to compare three or more groups/levels along the same dimension. It is similar to an independent-samples t-test, just with more groups.
- Example: Suppose you want to know whether leadership skills differ between Freshmen, Sophomores, Juniors, and Seniors. You would take the mean for each group and compare them to each other.
- Thus, use a one-way ANOVA when:
- You have three or more separate, non-overlapping groups or data sets that you want to compare.
Within-groups (Repeated measures) ANOVA
A within-groups ANOVA is used to compare data from related groups or the same people over time. This is similar to a dependent-samples t-test, just with more data sets. This is most often used when you are doing a longitudinal study that tracks the same people across time.
- Example: Suppose you want to track the development of leadership skills over time. You would administer your instrument to a group of students during their Freshman year, during their Sophomore year, during their Junior year, and again during their Senior year. The same group of people would be taking the survey each year. You would then compare the means of this group as Freshmen, Sophomores, Juniors, and Seniors.
- Thus, use a within-groups ANOVA when:
- You have separate data sets that are provided by the same people over time
Factorial ANOVA
A factorial ANOVA is used when you have two or more variables/factors/dimensions, and you want to explore whether there are interactions between these factors. Essentially, you are comparing the means of the various combinations of factors.
- Example: You want to know whether there is a difference between males vs. females and underclassmen vs. upperclassmen on appreciating diversity. While you could do two separate t-tests, you are also interested in knowing whether the combination of factors makes a difference. You administer your instrument and compare Males to Females, Freshmen to Seniors, and then subdivide the data to compare Freshman Males, Freshman Females, Senior Males, and Senior Females.
- Example: You want to know whether students improve their communication skills over time, but you are also interested in knowing whether this differs by major. You administer the instrument to the same group of students during their Freshman year and again during their Junior year. You compare the means of Freshmen to Juniors, Biology to Art to Education majors, and then subdivide the data to compare the means of Freshman Biology, Freshman Art, Freshman Education, Junior Biology, Junior Art, and Junior Education majors.
- Thus, use a factorial ANOVA when:
- You are interested in the interaction between two or more variables/factors/dimensions
One thing that is important to note about ANOVAs is that because there are more than two groups that are being compared, follow-up (or post-hoc) tests are often required to further interpret the data. For instance, if you compare Freshmen, Sophomores, Juniors, and Seniors on a measure of leadership skills and find a statistically significant difference, you will have to conduct follow-up tests to determine which groups are significantly different from each other. These follow-up tests may show that Freshmen and Sophomores are no different from each other, nor are Juniors and Seniors, but Juniors and Seniors both have better leadership skills than either Freshmen or Sophomores.
For more information about ANOVAs, visit:
http://onlinestatbook.com/2/analysis_of_variance/intro.html
Regression
A regression analysis is a statistical procedure that allows you to make a prediction about an outcome (or criterion) variable based on knowledge of some predictor variable. To create a regression model, you first need to collect (a lot of) data on both variables, similar to what you would do if you were conducting a correlation. Then you would determine the contribution of the predictor variable to the outcome variable. Once you have the regression model, you would be able to input an individual’s score on the predictor variable to get a prediction of their score on the outcome variable.
- Example: You want to try to predict whether a student will come back for a second year based on how many on-campus activities s/he attended. You would have to collect data on how many activities students attended and then whether or not those students returned for a second year. If activity attendance and retention are significantly related to each other, then you can generate a regression model where you could identify at-risk students (in terms of retention) based on how many activities they have attended.
- Example: You want to try to identify students who are at risk of failing College Algebra based on their scores on a math assessment so you can direct them to special services on campus. You would administer the math assessment at the start of the semester and then match each student’s score on the math assessment to their final grade in the course. Eventually, your data may show that the math assessment is significantly correlated to their final grade, and you can create a regression model to identify those at-risk students so you can direct them to tutors and other resources on campus.
- Thus, use regression when:
- You want to be able to make a prediction about an outcome given what you already know about some related factor.
Another option with regression is to do a multiple regression, which allows you to make a prediction about an outcome based on more than just one predictor variable. Many retention models are essentially multiple regressions that consider factors such as GPA, level of involvement, and attitude towards academics and learning.
For more information about regression, visit:
http://onlinestatbook.com/2/regression/intro.html
For more information about what statistical test to use, visit:
http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm
http://www.graphpad.com/www/book/choose.htm