Courtney Brown, Ph.D. | Academic Data Sets and Programs

H O M E
ACADEMIC
	A Brief Biography
	Curriculum Vitae
	Career Guidance Videos
	Political Music Videos
	Political Music Articles
	R Tutorial Videos
	Data Sets and Computer Programs
Print Publications
	Scholarly
	Speculative Nonfiction
CONSCIOUSNESS STUDIES
	The Farsight Institute
	Farsight Prime (Video)
	Book Reviews
PUBLIC SPEAKING
	Videos
	Publicity Photos
	Speaking Requests
THEATRICAL
	Farsight Prime
	African Television
	Music Videos
C O N T A C T

Follow on FB
Follow on Twitter

Variable Descriptions for U.S. Aggregate Data Sets

The aggregate county-level data sets for the United States which I have used in many of my published research were originally supplied to me by the Interuniversity Consortium for Social and Political Research (ICPSR). The data were originally in hundreds of separate files containing political and census data which were sorted by year and state. I had to assemble single data sets for specific periods of time from these separate files. To avoid having to write variable labels for all of the variables, I used a common scheme for the variable names. This scheme is described below.

The variables for political data generally have names with eight characters. The first character indicates whether the variable is for presidential or congressional data (P or C, respectively). The next four characters indicate either the party or if the data are the total vote, where

0100 = Democratic Party vote (county-level)
0200 = Republican County vote (county-level)
TOTA = Total Vote (county-level)

Following the party or total vote designators is the year for the data.

For example, the variable P0100920 would be the county-level presidential vote in the U.S. for the Democratic Party in the year 1920. Similarly, C0200932 would be the county-level congressional vote in the U.S. for the Republican Party in the year 1932.

Eligible voters (as determined by age using U.S. Census data) are found in variables beginning with the letter "E" followed by the year. Thus, E1930 represents the total adults 21 years and older as determined by the U.S. Census for the year 1930.

Other variables should be self-explanatory, where the meaning either is explicit in the name of the variable or the meaning is stated in its variable label.

For the mcp5084 data set (the "mcp" stands for merged census and political data), the data are coded somewhat differently. In this instance single letters are used to designate the party, and two digits are used for the year. Thus, PD52 stands for county-level presidential votes for the Democratic Party in the year 1952. Similary, CR80 would be congressional votes for the Republican Party in the year 1980.

For all of the data sets, the variable COUNTY is the identifier for each county, which begins with the two-digit state number identifier used by the ICPSR followed by the county number used by the ICPSR. The exact SAS statement that was used to create this variable is:
COUNTY = INT((STATE*1000000) + COUNTNUM);