Skip to Main Content
  • Type/Group: General Purpose/Skills Centre
  • Maths and Stats Support

    ~ ~

    Maths and Stats Support

    stats resources

    <  Back to Stats Resources

    Tests for categorical data


    Chi-squared tests are for categorical data which does not have an order (Nominal). The choice of test depends on how many independent variables you have. The most commonly used is the Chi-squared test of association which tests for group differences.
    Before carrying out any analysis, check how many observations are in each group. Either exclude small groups e.g. prefer not to say, or combine small groups together

    Chi-squared test of association

    Use: Testing for an association between two categorical variables or for group differences

    Dependent (Outcome): Nominal;
    Independent (predictor): Two or more groups

    Example:Is there an association between newspaper readership and voting preferences.

    Summary statistics/graphs:%'s out of the independent variable e.g. what % of Guardian readers voted labour and stacked/multiple bar charts.

    Note: If you have ordinal data e.g. strongly disagree- strongly agree responses, the Mann-Whitney test is more appropriate

     

    Chi-squared goodness of fit

    Use: One nominal variable is tested against population proportion or testing whether the data fits a specific probability distribution.  You will only have one variable and compare to numbered values from the population

    Dependent (outcome) variable: Two or more categories 

    Example: testing whether eye colour proportions in Sheffield match general UK proportions

    Summary statistics/graphs: Percentages and bar/pie chart

     

    Logistic regression

    Use: Tests which of multiple independent variables are significant predictors of a binary outcome such as survival and produces a model (regression equation) to predict the likelihood of the event happening.

    Dependent (Outcome): Binary (2 categories);
    Independent (predictor): Any number of continuous or binary variables.

    Example: Car insurance companies use logistic regression to identify the factors which increase the likelihood of someone crashing. Car insurance premiums are then based on the predicted probability of you having a crash

    Summary statistics/graphs:%'s for binary outcomes and means/ standard deviation for continuous independent variables

    Note: If you have more than two categories for your dependent variable, consider combining categories so that there are only two categories and using logistic regression rather than using more complex techniques such as multinomial or ordinal regression

     

    Resources by software

          The following resources show you how to carry out the techniques in SPSS, understand the output and report results

     

      The Jamovi videos cover everything from carrying out analysis to reporting including suitable summary statistics.

      The resources guide you through the r code and interpretation of the relevant summary statistics and test . The program code files contain all the code can be easily adapted to run on your own data.

         These resources contain the SAS code, output and interpretation

     These resources show the calculations for the specified techniques

    Test chooser

    Test chooser resources

    Need help choosing the right test?