Skip to Main Content
Maths and Stats Support

Maths and Stats Support

stats resources

<  Back to Stats Resources

Tests for categorical data


Chi-squared tests are for categorical data which does not have an order (Nominal). The choice of test depends on how many independent variables you have. The most commonly used is the Chi-squared test of association which tests for group differences.
Before carrying out any analysis, check how many observations are in each group. Either exclude small groups e.g. prefer not to say, or combine small groups together

Chi-squared test of association

Use: Testing for an association between two categorical variables or for group differences

Dependent (Outcome): Nominal;
Independent (predictor): Two or more groups

Example:Is there an association between newspaper readership and voting preferences.

Summary statistics/graphs:%'s out of the independent variable e.g. what % of Guardian readers voted labour and stacked/multiple bar charts.

Note: If you have ordinal data e.g. strongly disagree- strongly agree responses, the Mann-Whitney test is more appropriate

 

Chi-squared goodness of fit

Use: One nominal variable is tested against population proportion or testing whether the data fits a specific probability distribution.  You will only have one variable and compare to numbered values from the population

Dependent (outcome) variable: Two or more categories 

Example: testing whether eye colour proportions in Sheffield match general UK proportions

Summary statistics/graphs: Percentages and bar/pie chart

 

Logistic regression

Use: Tests which of multiple independent variables are significant predictors of a binary outcome such as survival and produces a model (regression equation) to predict the likelihood of the event happening.

Dependent (Outcome): Binary (2 categories);
Independent (predictor): Any number of continuous or binary variables.

Example: Car insurance companies use logistic regression to identify the factors which increase the likelihood of someone crashing. Car insurance premiums are then based on the predicted probability of you having a crash

Summary statistics/graphs:%'s for binary outcomes and means/ standard deviation for continuous independent variables

Note: If you have more than two categories for your dependent variable, consider combining categories so that there are only two categories and using logistic regression rather than using more complex techniques such as multinomial or ordinal regression

 

Resources by software

      The following resources show you how to carry out the techniques in SPSS, understand the output and report results

 

  The Jamovi videos cover everything from carrying out analysis to reporting including suitable summary statistics.

  The resources guide you through the r code and interpretation of the relevant summary statistics and test . The program code files contain all the code can be easily adapted to run on your own data.

     These resources contain the SAS code, output and interpretation

 These resources show the calculations for the specified techniques

Test chooser

Test chooser resources

Need help choosing the right test?