Guides: Professional Development for State Employees: Analyze & Interpret the Data

Analyze & Interpret Data

The data analysis technique chosen will depend on the type of data collected and how it is prepared for analysis. Data can often be analyzed both qualitatively and quantitatively. Survey responses, for example, can be analyzed qualitatively by studying the meanings of the responses or quantitatively by studying the frequencies of the responses.

Statistics: is a study of quantitative data; a method of gaining inference from a large quantity of data; a method of interpreting measurements. The reason valid statistics are important is because "the world is full of variation, and sometimes it's hard to tell real differences from natural variation" (Biostatistics : the bare essentials).

"In order to determine whether or not numerical differences in observations are due to treatments, we need to know how much error was encountered within the experiment. Statistics allow us to quantify and assess this error (experimental error)...the two most important concepts of modern statistics: (1) to estimate the experimental error of treatments requires replication, and (2) to ensure an unbiased estimate of experimental error requires randomization of the treatments" (Statistics and agricultural research).

Data Analysis, Statistics & Probabilities

Qualitative data analysis deals with non-numeric data, such as words, descriptions, images, objects, etc. Content Analysis is the most widely used technique in this field. It is employed to analyze documented and recorded communication. The information can be collected from written (books, newspapers, social media posts), oral (speeches and interviews) or visual (photographs and video) forms.

Alternative Approaches:

Conversation Analysis: analyzes the sequential organization and details of conversation, the moment-by-moment interchange. It focuses on how reality is constructed, rather than on what it is.

Narrative Analysis: focuses on "the story itself" in order to put together the "big picture." Narratives can typically be coded into four types of stories: action tales, expressive tales, moral tales and rational tales.

The 5 steps listed below outline "the different techniques that are shared by most approaches to qualitative data analysis:

Documentation of the data and the process of data collection
Organization/categorization of the data into concepts
Connection of the data to show how one concept may influence another
Corroboration/legitimization, by evaluating alternative explanations, disconfirming evidence, and searching for negative cases
Representing the account (reporting the findings)" (Qualitative Data Analysis, (Ch. 10))

Descriptive Statistics is that branch of statistics which is concerned with describing the population under study. It summarizes the data, which is already known. Descriptive statistics are visual in form. Means, medians, variances, standard deviations, correlation, coefficients, etc. can be communicated via charts and graphs.

"Descriptive Statistics are used to present quantitative descriptions in a manageable form. In a research study we may have lots of measures. Or we may measure a large number of people on any measure. Descriptive statistics help us to simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots of data into a simpler summary" (Research Methods Knowledge Base > Descriptive Statistics).

"Descriptive statistics convey two basic aspects of a sample: central tendency and dispersion. The former describes the most representative or common or central observation of the sample, and the latter how the sample is distributed around the common variate" (Statistics for Anthropology)

Statistical methods used in the public health literature and implications for training of public health professionals

Applying Social Statistics by Jay Alan Weinstein
ISBN: 9780742563735

Publication Date: 2010-03-16

Inferential Statistics makes statements about the population from which samples were obtained. It is the branch of statistics dealing with conclusions, generalizations, predictions, and estimations based on data from samples.

"With inferential statistics, you are trying to reach conclusions that extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what’s going on in our data" (Research Methods Knowledge Base > Inferential Statistics).

There are many types of inferential statistics and which one(s) to use will be specific to the research design and sample characteristics. However, most inferential statistics are based on the principle that a test-statistic value is calculated on the basis of a particular formula.

Statistical methods used in the public health literature and implications for training of public health professionals

Applying Social Statistics by Jay Alan Weinstein
ISBN: 9780742563735

Publication Date: 2010-03-16

Hypothesis Testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps:

1. The null (H0) and alternative (H1) hypotheses are stated.

2. The level of statistical significance,* (criteria) is established for a decision .

3. Collect the sample / Compute the test statistic** and compare the produced p-value to the criterion set in step 2.

The p-value is the probability value.
- "P value is the probability, given that the null hypothesis is true, of obtaining data as extreme or more extreme than that observed" (Oxford Handbook of Medical Statistics)
- "The probability of any event can only vary between 0 and 1 (which correspond to 0 and 100%). If an event is certain to occur, it has a probability of 1; while, if it is certain the event will not occur, it has a probability of 0" (Statistics explained : an introductory guide for life scientists).

4. Make a decision based on the probability. The sample is compared with the null hypothesis’ parameters, and a conclusion is reached about which hypothesis to accept. The decision to reject or retain the H0 is called significance.***

When testing a hypothesis, two errors may be committed: 1) Rejection of the null hypothesis when in fact it is true (type I error), or 2) Failure to reject a false null hypothesis (type II error).

Definitions:

*"Level of significance, or significance level, refers to a criterion of judgment upon which a decision is made regarding the value stated in a null hypothesis. The criterion is based on the probability of obtaining a statistic measured in a sample if the value stated in the null hypothesis were true. The most commonly used significance level is 5%, (0.05).

**The test statistic is a mathematical formula that allows researchers to determine the likelihood of obtaining sample outcomes if the null hypothesis were true. The value of the test statistic is used to make a decision regarding the null hypothesis.

***Significance, or statistical significance, describes a decision made concerning a value stated in the null hypothesis. When the null hypothesis is rejected, we reach significance. When the null hypothesis is retained, we fail to reach significance" (Introduction to hypothesis testing, (Ch. 8).

Also see: Formulate a Research Question or Hypothesis

Note: A test statistic is a quantity, derived from the sample, used to determine how well the model/means fits the data.

Parametric Tests (regression, comparison and correlation) are used where a Normal distribution of variables is assumed (think "bell-shaped curve":

Regression Models are used to test cause-and-effect relationships.
- Simple Regression is used to estimate the relationship between two quantitative variables.
  - The Coefficient of Determination (or R2 value) is the test statistic used to evaluate the goodness-of-fit of the model. R²= 1 - unexplained variation / total variation
- Multiple Linear Regression is used to describe relationships between two or more independent variables and one dependent variable.
- Logistic Regression is similar to multiple linear regression, except the response/outcome variable is binary ("0" or "1") which is used to represent "Yes" or "No." "Pass" or "Fail," for example.
Comparison Tests are used to look for differences among group means. Expected outputs will be the t-value, degrees of freedom, p-value
- T-tests are used to compare the means of two groups. To chose which t-test to use, ask whether the groups being compared derive from a single population or two different ones, and whether you want to test the difference in a specific direction. The t-value (or t-statistic) is the test statistic used to determine whether the null hypothesis is supported or rejected.
- ANOVA & MANOVA: Analysis of variance (ANOVA) is the statistical procedure of comparing the means of a variable across several groups (more than two groups). ANOVA uses F-tests to assess the equality of the means. The F-value (of F-statistic) is the ratio of two variances (between group variation to within group variation). Used as the test statistic, the larger the value, the more likely it is that the variation associated with the independent variable is real and not due to chance. Multivariate analysis of variance (MANOVA) is used to compare means of several variables simultaneously across several groups (more than two groups).
Correlation Tests check whether two variables are related without assuming cause-and-effect relationships.
- Pearson' r is used to express the correlation between two variables X and Y.
- Looking for an association between two categorical variables? A Chi-squared test (or X²) "calculates the frequencies that would be expected if there were no association (i.e. null hypothesis is true)" (Oxford handbook of medical statistics).

Non-parametric Tests do not make as many assumptions about the data. Useful when one or more of the common statistical assumptions are violated (i.e. Normal distribution and a linear relationship do not hold), but the inferences that are made aren't as strong as with parametric tests:

Rank Correlation Tests: Assumptions for these tests (Spearman or Kendall) are that variables can be ranked and the relationship between the variables either increases or decreases. Use in place of regression and correlation tests.
"Kruskal-Wallis Test is a nonparametric approach to the one-way ANOVA. The procedure is used to compare three or more groups on a dependent variable that is measured on at least an ordinal level" (Comprehensive Clinical Psychology).
Wilcoxon Rank-Sum Test is the nonparametric version of the independent t-test, commonly used for the comparison of two groups of nonparametric data.
Use the Wilcoxon Signed Rank Test (or Wilcoxon Matched Pairs Test) for a repeated measure design where the same sujects are evaluated under two different conditions.

Data Visualization

Graphic Representation of Research

To communicate the information of your research, the data will often need to be described in numeric and tabular form. But graphics can allow for data to be displayed in a visual/pictorial form that facilitates more insight into the data.

"Three Basic Principles for Graphical Presentation--Always remember 3 principles in illustrating the results

Heading or title: Each figure should have clearly defined heading / title
Simplicity: Presentation of data should be simple to understand
Honesty: If possible always give the sample size with results" (Handbook of basic statistical concepts : for scientists and pharmacists)

Another consideration, from an accessibility standpoint, would be color and font choice and consistency.