While there are many advanced statistical packages out there you don’t need them to perform a detailed and comprehensive analysis of your survey data.
You can use the same techniques and approaches in Excel and in this post, I’ll take you through how to analyze survey data in Excel.
Excel is a very good tool to use for your analysis and has the benefit of being on almost everyone’s desktop. With a little bit of insight, you can do almost everything the statistical packages can do in Excel.
Don’t fall into the trap of thinking it’s an inferior tool.
What Your Analysis Needs to Achieve
When analyzing survey data there are surprisingly few different types of questions you need to answer.
You will generally want to know:
- If the responses for one question / customer segment are significantly different from the responses for another question /customer segment.
- If the responses for one question are changing over time.
- If the responses for one question are correlated (implies causes / is caused by) another question.
This is good news because there are only a few types of statistical approaches you will need.
There are three different types of data a typical customer survey generates:
Numbers: also called ordinal data
Numbers come from questions like this:
How responsive is Company X in closing the loop after problem resolution? Where 1 is very unresponsive and 5 is very responsive
Categorical: Category data
Categorical data comes from questions like this:
Of the following, which is your favourite colour: red, blue, green
Text data comes from questions like this:
Please tell us what you like most about our widget.
We are going to focus on Number data in this post.
You Survey Analysis Plan
Below are the steps in using Excel to analyze your survey data:
- Calculate simple statistics (mean, max, etc.) for all questions
- Graph the questions, with error bars
- Create histograms for the questions and interpret them
- If you have past data, plot averages over time, with errors
- Decide which groups you want to compare and test to see if they have changed
- Perform correlation analysis to see if two variables might be linked: mostly on outcome variables and attributes.
In the rest of this post I’ll show you how to undertake each of those tasks.
Simple Statistics Using Excel.
Generating simple statistics for your data, mean, maximum and minimum, is quite easy.
In the image below we have included the standard maximum, minimum and average along with a couple of additional statistics: Standard Error and 1.96 x Standard Error. We’ll use these statistics in the next section when we graph the scores add confidence intervals for the average.
Excel has formulas for each of these and is smart enough that you can simply highlight an entire data column and it will calculate the statistics for you.
Graph Each Question and add error bars
When you present your data you will almost certainly use charts. So, the next step is to graph the average of each question.
However, one of the problems with most feedback charts is they show the average as one number. This can be misleading because the response average is only an estimate of the population average.
With a couple of simple steps, you can add Error Bars to your charts to show the 95% confidence interval – which is much more informative.
To interpret the error bars you need to know a little about Standard Error:
The standard error of the sample mean is an estimate of how far the sample mean is likely to be from the population mean.
When we multiply it by 1.96 we are creating a 95% confidence interval. In other words we are 95% confident that the actual average for all customers is between the upper error bar and the lower error bar.
Here is a quick video on exactly how to do that.
Add Histograms of each question
Before you go on to more advanced statistics you should also create histograms of them to better understand the data.
What are Histograms?
Histograms show you the proportion of different responses to a question and can be used on Number and Categorical data.
They are very useful because, while averages can be interesting, they can also hide a lot information about the responses.
Are the responses nicely spread around the average or skewed to one side or maybe there are there two peaks?
While each sample might have the same average, they can have very different looking histograms – and seeing those histograms can help you more effectively interpret the responses.
So, you should create a histogram for each of your questions and then review them for interesting profiles. You can use this resource to help you interpret your histograms
How to Create a Histogram
Note: You will need the Excel Data Analysis tool kit before you begin. If you don’t have it, or you’re not sure, check out the Microsoft site
First, you’ll need to create a set of “bin” values. Typically, your survey response scale will use 0-5 or 0-7 or 1-7 or something similar so your bins need to cover each of the score options.
Here is a bin range for a 0-5 response scale. Note that it starts at -0.5 to catch the 0’s and goes to 5.499 to catch the 5’s. Also, note the use of “0.499” elements – this is to catch the half scores that some people might use (if your survey allows it.)
Now you have the bins you can create a histogram for each of your survey response. This short video shows you exactly how to do that.
Testing for Significant Changes: Student’s t-Test
Now, if your histograms show a relatively nice “normal” distribution curve, you can use some more advanced statistics to add value to the data.
See below for the histogram from the example. You can see that it has a nice even shape with just one peak. It looks normally distributed (the “bell curve”) so we can carry on with the more advanced statistics.
Student’s t-Test is a good way to answer two of our questions from the start of the post:
- Are the responses for one question / customer segment significantly different from the responses for another question / customer segment
- Are the responses for one question changing over time
The t-Test allows you to test if the average for one set of scores is probably different from the average for another set of scores.
Hopefully you can see that both these questions, and all their variations, are asking this question:
Does the avearage for this set of question scores differ significantly from the average for that set of question scores
At all times, you are just comparing two sets of scores, for example:
- Question 1 last month with Question 1 this month
- Question 1 this month with question 2 this month
- Question 1 for respondents from Segment A with Question 1 for respondents from Segment B
So, you can use the T-test in all these areas. I’ve already written a detailed post exactly how to use the t-Test on survey responses so I’ll refer you there for the details.
Note: there are downsides with just comparing everything with everything. Think about it for a few seconds. If you set your test so you have a 95% confidence level and you test 20 different pairs of responses, on average, one of them will show a positive test just on random chance (5% chance). There are ways to overcome this problem but they are outside the scope of this post.
So, in general take care just comparing everything with everything – do it deliberately.
Cause and effect is the focus of this section. For instance: does our Net Promoter Score go up, and down as our Responsiveness score goes up and down?
You have probably heard the saying that correlation does not imply causation and it is true that you need to be careful not to over emphasize the impact of a high correlation. However, it is also true that correlation does not deny causation. It’s a good place to start your investigation.
When investigating correlation, start by looking at how your survey outcome question correlates with your attribute questions. Creating the chart below for each of your attributes is a good place to start.
Attribute questions that correlate highly with the outcome question are good candidates for key business drivers.
Calculating in Correlation in Excel
The easiest way of calculating correlation is to simply graph your two data sets and ask Excel to add in the “Linear Trend Line”. This has the advantage of also letting you see the data so you can determine if the data looks right, i.e. one or two points are not skewing the results.
Again I have a short video to help you on this.
Interpreting the correlation coefficient (R) is the subject of a whole blog post, but in summary:
The correlation coefficient (R) indicates the correlation between the two values.
- R = 1 is a perfect correlation
- R = 0 indicates no correlation at all
- R = -1 is a perfect negative correlation, a move up in one variables correlates to a move down in the other.
So, the closer 1 the better the correlation and you can generally ignore anything less that about 0.3 for this value.
Excel shows you the R-squared value. This number indicate the percentage of variation in that one variable explains variation in the other variable.