About The ARDA | Tools | FAQs | Contact Us
Recoding

The raw responses to questionnaire items are not always in the right form for analysis, and secondary analysis of existing data often has different goals from the original research that collected the data, so it is often necessary to change the coding of responses.  This must be done with care.

Agree-disagree items are among the most common in surveys, but different researchers format them differently, notably in how to handle people who are ambivalent or lack a strong opinion.  The 2008 General Social Survey included this statement (item 600: MYWAYGOD): “I have my own way of connecting with God without churches or religious services.”  Here is how responses were originally coded, placing the ambivalent response, “neither agree nor disagree,” between agree and disagree:

1) Strongly agree = 339 cases

2) Agree = 543 cases

3) Neither agree nor disagree = 155 cases

4) Disagree = 194 cases

5) Strongly disagree = 118 cases

8) Don’t know = 9 cases

9) No answer = 7 cases

Before running any correlations between this item and others, one would need to recode “no answer” as missing values, so the computer will ignore these 7 cases.  Then one must decide whether to recode “don’t know” also as missing values, or to combine these 9 cases with the 155 in “neither agree nor disagree” and just another way of expressing ambivalence.

Sometimes, especially when preparing complex tables for publication, it may be legitimate to collapse categories through recoding.  For example, to simplify a discussion one might combine “strongly agree” with “agree,” and “strongly disagree” with “disagree.”  When doing serious statistical analysis, however, one should generally avoid collapsing categories, because doing so reduces the precision of the measure.

A very different but widely-followed tradition in survey research places ambivalent responses last, in order to encourage the respondent to express a definite opinion.  A series of items in the 2005 Baylor Religion Survey asks people about their conceptions of God, and one (item 65: Q22G) states God is “angered by my sins.”

1) Strongly agree = 381 cases

2) Agree = 412 cases

3) Disagree = 316 cases

4) Strongly disagree = 227 cases

5) Undecided = 212 cases

X) Missing = 173 cases

Note that here “undecided” comes last.  Assigned the number 5, it could appear wrongly to the computer as an extreme form of disagreement, in many kinds of statistical analysis.  Therefore, prior to running correlations one would either want to remove these cases, recoding them as missing, or more likely to recode like this:

1) Strongly agree = 381 cases

2) Agree = 412 cases

3) Undecided = 212 cases

4) Disagree = 316 cases

5) Strongly disagree = 227 cases

X) Missing = 173 cases

One should always familiarize oneself with the full codebook or actual questionnaire of a study, when doing secondary analysis.  For example, here the large number of missing cases could have been people who did not express belief in God in response to an earlier question, and who therefore were not asked this question about God being angered.  Similarly, when writing up results, one should explain very clearly what recoding was done, so the reader can properly interpret the results.

When considering recoding, you must be very clear on what the item responses mean, and how that meaning connects to your research goals, in order to judge what might be appropriate.  For example, a question on evolution in the National Survey of High School Biology Teachers (item 49: STATEEV) asked whether their state’s standards included education, and the five values of the variable were:

1) Yes, but not human evolution = 271 cases

2) Yes, including human evolution = 571 cases

3) No = 18 cases

4) I am not sure = 42 cases

X) Missing = 24 cases

If this were an agree-disagree item, you might consider recoding “I am not sure” between yes and no, but this does not make sense here.  The respondent is not expressing ambivalence but lack of knowledge, so “I am not sure” responses could be recoded as missing values.  You could combine responses 1 and 2 as a general yes response, but there are only 18 no responses so this would not be very productive for most kinds of statistical analysis.  However, if your goal were to study the teaching of human evolution in school, you could combine responses 1 and 3 and label it “No, human evolution is not included in the state’s biology curriculum.”

QuickSearch The Knowledge-Base

To search the knowledge-base, enter a term below:

Select a Theory below to learn more:

Select a Concept below to learn more:

TCM Contributors

Would you like to be considered for a position on the Theories, Concepts, and Measures contribution team? If so, click on the link below and complete the TCM Online Request Form.

TCM Online Request Form

If you are already a contributor for Theories, Concepts, and Measures and need assistance in managing the content, please click on the link below for instructions.

Site Administration Instructions