About The ARDA | Tools | FAQs | Contact Us
Oversampling

Users of existing questionnaire survey data need to inform themselves about how the sample was designed and the respondents were acquired. For example, the influential 1963 Survey of Northern California Church Bodies was administered to members of a range of specific denominations, not to the general public, and in a limited geographic area. In this case, the very name of the survey suggests that it is a survey of church members, not a random sample of all adults living in the area, but often one must study the codebook and description of a particular survey project carefully to understand exactly what the sample was. A subtle but important fact about many high-quality surveys is that they oversampled some groups in the population, and thus members of those groups are overrepresented in the data.

Perhaps the clearest examples concern the General Social Survey, a repeated survey of the uninstitutionalized adult population of the United States. Notice right away that the sample does not include convicts, children, or US citizens living overseas. Typically, the GSS aims for about 1,500 respondents, annually or on a more complex schedule that adds about this number of respondent to the database each year. Many of the items are asked repeatedly, so if one’s scientific goals do not involve rapid changes over time, it may be feasible to work with sample much bigger than 1,500 by combining the data from all years that asked the questions of interest. This makes it possible to compare respondents of minority groups with each other and with the other respondents. By definition, members of minority groups are somewhat rare, so for some research purposes the GSS has on occasion oversampled them.

In 1982 and 1987, the GSS intentionally oversampled African Americans, so that there would be enough respondents from this group answering a number of questions where they may differ from other groups. Data for these years have SAMPLE and OVERSAMP variables that let researchers compensate for this oversampling in doing their analysis. Depending on one’s purposes, one must choose among several methods of doing this. If one’s research does not involve race, then the best course probably is simply to exclude the oversample. Given some statistical skill, it is also possible to weight the results, including all the respondents but weighting their data to approximate the distribution in the general population. However, if race is central to the analysis, for example comparing African Americans with European Americas, one may want to start by analyzing the two groups separately, and then comparing results, perhaps simply by reporting percentages giving certain responses, calculated within each group, but side-by-side in a table. In reporting more sophisticated statistics, it is crucial to say whether or not the results were weighted. Often there is good justification for each possible choice, but in reporting results it is absolutely essential to tell the reader what procedures were used.

Sometimes, oversamples are unintentional. One example for which the GSS provides a method for compensating is the oversampling of adults who live in small households. The standard GSS sampling design was in several stages, one of which involved sampling households, Then one individual only was surveyed in each household that wound up in the sample. This means that individuals who lived in households with several adults were less likely to wind up in the sample. A person in a 2-adult household had a 50% chance of being interviewed, compared with a 25% chance for one living in a 4-adult household. The GSS includes a variable, ADULTS, that can be used to re-balance the sample, because it says how many adults were in the household the respondent lives in. There is some concern that the practicalities of in-home interviewing may also have biased the sample against people who are seldom home, and even well-funded surveys following excellent sampling methods cannot entirely avoid such problems. Thus, the goal can never be perfect calibration through weighting and other statistical means, but being alert to situations where intentional or unintentional oversampling can and should be dealt with during analysis of the data.

On occasion, surveys with a focus on religion intentionally oversample minority groups of interest for the particular research. For example, the National Study of Youth and Religion included a Jewish oversample. Spirit and Power: A 10-Country Survey of Pentecostals started with a random sample of the public, then oversampled pentecostals and charismatics. Given that its focus was views about the Roman Catholic Church, the 2002 ABC News Church in Crisis Poll oversampled Catholics. The Generation Next Survey oversampled young adults, aged 18-25. God and Society in North America recruited about 3,000 respondent each in Canada and the United States, which can be conceptualized as oversampling Canadians, if the population in mind is residents of North America north of Mexico.

QuickSearch The Knowledge-Base

To search the knowledge-base, enter a term below:

Select a Theory below to learn more:

Select a Concept below to learn more:

TCM Contributors

Would you like to be considered for a position on the Theories, Concepts, and Measures contribution team? If so, click on the link below and complete the TCM Online Request Form.

TCM Online Request Form

If you are already a contributor for Theories, Concepts, and Measures and need assistance in managing the content, please click on the link below for instructions.

Site Administration Instructions