<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Offensive Politics</title>
	<atom:link href="http://offensivepolitics.net/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://offensivepolitics.net/blog</link>
	<description>Electoral and financial data hackery</description>
	<lastBuildDate>Mon, 30 Aug 2010 21:10:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>US House Election Results Visualized Five Ways</title>
		<link>http://offensivepolitics.net/blog/?p=357</link>
		<comments>http://offensivepolitics.net/blog/?p=357#comments</comments>
		<pubDate>Mon, 30 Aug 2010 13:58:41 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[US House]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=357</guid>
		<description><![CDATA[The Democratic major-party vote share of US House elections 2002-2008 visualized 5 different ways.]]></description>
			<content:encoded><![CDATA[<p>I have been working on an analysis, using <a href="http://offensivepolitics.net/housedata/" target="blank">OP HouseData</a>, of what effect esoteric campaign finance variables might have on election returns in the US House. To kickoff this project I need a baseline idea of how the Democratic vote share in the US House changed during my target period of 2002 to 2008. With this information I could look for intra-year trends or inter-year clusters that could inform which financial variables I&#8217;d include in my analysis. </p>
<p>For the baseline summary I considered using a color-coded map (like <a href="http://innovation.cqpolitics.com/atlas/house2010_rr" target="blank">CQ</a>, <a href="http://www.cnn.com/ELECTION/2008/results/main.results/#H" target="blank">CNN</a>) but I care more about aggregates than individual districts or states. Instead I created five non-map visualizations of the same vote share data, using <a href="http://www.r-project.org/" target="blank">R</a> and <a href="http://had.co.nz/ggplot2/" target="blank">ggplot2</a>. Each visualization helped me better understand my data and refine my assumptions and expectations, even if I eventually discarded the output. The interactive nature of R allowed me to experiment and iterate very quickly until I got what I needed. The R code and data are available at the end of the post.</p>
<p><div id="attachment_360" class="wp-caption aligncenter" style="width: 682px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/08/f1.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/08/f1.png" alt="" title="Figure 1" width="672" height="671" class="size-full wp-image-360" /></a><p class="wp-caption-text">Figure 1: Democratic Vote Share US House 2002-2008 (scatterplot)</p></div><br />
<strong>Methodology</strong>: Figure 1 is a simple scatter plot with the vote share on the Y axis and the election year on the X axis. Each of 435 seats is plotted as a single point, and the points are alpha blended to highlight groupings of similar returns. A horizontal line is drawn at 50% vote share.<br />
<strong>Interpretation</strong>: Points below the 50% line show a loss for a Democrat, points above show a win. Lighter gray points means fewer seats at a particular vote share.<br />
<strong>Problems</strong>: With 435 points per year the plot suffers from over plotting even with alpha blending. The breaks in the alpha blend are too few so 5 points and 25 points look identical. </p>
<p><div id="attachment_361" class="wp-caption aligncenter" style="width: 682px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/08/f2.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/08/f2.png" alt="" title="Figure 2" width="672" height="671" class="size-full wp-image-361" /></a><p class="wp-caption-text">Figure 2: Democratic Vote Share US House 2002-2008 (scatter + jitter)</p></div><br />
<strong>Methodology</strong>: Figure 2 is another scatter plot with the vote share on the Y axis and the election year on the X axis. Each of 435 seats is plotted as a single point, and each point is alpha blended to visually highlight similar returns. A random horizontal jitter was added to every point to reduce overplotting. A horizontal line is drawn at 50% vote share.<br />
<strong>Interpretation</strong>: Points below the 50% line show a loss for a Democrat, points above show a win. I can&#8217;t answer how many seats had a given vote share, and due to the jitter I can&#8217;t reasonably identify groupings let alone intra-year trends.<br />
<strong>Problems</strong>: Jittering addresses some of the over fit problem from Figure 1, but now Figure 2 obfuscates any patterns since the data now looks like random noise. </p>
<p>The scatter plots helped me realize what I really wanted was a summary of the distribution of the democratic vote share, not the raw values themselves. That lead me to the following: </p>
<p><div id="attachment_362" class="wp-caption aligncenter" style="width: 682px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/08/f3.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/08/f3.png" alt="" title="Figure 3" width="672" height="671" class="size-full wp-image-362" /></a><p class="wp-caption-text">Figure 3: Democratic Vote Share US House 2002-2008 (histogram)</p></div><br />
<strong>Methodology</strong>: A 4-panel graphic with a histogram of vote share per-year. The histogram bar/bin width is two percentage points. A vertical line is drawn at the 50% vote mark.<br />
<strong>Interpretation</strong>: Bars to the left of the 50% line show a loss for a Democrat, bars to the right show a win. The Y measure shows us how many races were uncontested by Democrats (0 vote share), and how many were uncontested by Republicans (100 vote share). We can see clear groupings of core Democratic and Republican seats that remain somewhat static across elections to the left and right of center, but there is some movement back and forth across the 50% win line as control of the House changed hands in 2006.<br />
<strong>Problems</strong>: The counting measure is much better at showing the actual distribution of returns but is too raw for comparisons. </p>
<p>Since a histogram was too raw I decided to switch back to a <a href="http://mathworld.wolfram.com/Box-and-WhiskerPlot.html" target="blank">box-and-whisker plot</a>.</p>
<p><div id="attachment_383" class="wp-caption aligncenter" style="width: 682px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/08/f4.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/08/f4.png" alt="" title="Figure 4: Democratic Vote Share US House 2002-2008 (box and whisker plot)" width="672" height="671" class="size-full wp-image-383" /></a><p class="wp-caption-text">Figure 4: Democratic Vote Share US House 2002-2008 (box and whisker plot)</p></div><br />
<strong>Methodology</strong>: A box and whisker plot summarizing the distribution of Democratic vote share. The box shows the median value with a horizontal line, and 1st and 3rd quartiles below and above the median line. The whiskers represent values outside the inter-quartile range of the box.<br />
<strong>Interpretation</strong>: This plot provides several pieces of useful information. The spread between the Q1 and Q3 quartiles shrinks from 2002 to 2008, indicating closer races. The median line of jumps over the 50% win mark in 2006 and 2008 which coincides with the Democrats taking back the House.<br />
<strong>Problems</strong>: The whisker portion of the plot is less useful since we can&#8217;t see the distribution of outliers. </p>
<p>This lead me to use the established seats-votes plot from theoretic political science literature (<a href="http://www.stat.columbia.edu/~gelman/research/published/house2006_new.pdf" target="blank">Kastellec, Gellman, Chandler (2006)</a>, and <a href="http://cran.r-project.org/web/packages/pscl/pscl.pdf" target="blank">Jackman, etc (PDF)</a>. </p>
<p><div id="attachment_363" class="wp-caption aligncenter" style="width: 682px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/08/f5.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/08/f5.png" alt="" title="Figure 5: Democratic Vote Share USE House 2002-2008 (Seats-Votes plot)" width="672" height="671" class="size-full wp-image-363" /></a><p class="wp-caption-text">Figure 5: Democratic Vote Share USE House 2002-2008 (Seats-Votes plot)</p></div><br />
<strong>Methodology</strong>: A 4-panel smoothed density curve of Democratic vote share, and 1d rug showing counts across the bottom. A vertical line is drawn at the 50% mark.<br />
<strong>Interpretation</strong>: The contours of the curves on the seats-votes plot show some very interesting information about the makeup of the House, and taken over time it is very easy to see the emergence Democratic majority in 2006. I also see changes in the number of uncontested seats over time, and the stabilization of the safe Democratic seat peak to the left of the 50% mark. The stabilization of the peaks in 2006 and 2008 around the 50% mark aligns well with what we saw in Figure 4.<br />
<strong>Problems</strong>: There are no problems with this Figure. </p>
<p>It is no surprise the seats-votes plot proved to be the most useful for my purposes since it was specifically designed, by very smart social and political scientists, to look at this type of data. The seats-votes plot is very versatile and can be adapted to a single election by looking at all precincts within a single district. I performed this type of analysis in <a href="http://offensivepolitics.net/blog/?p=113" target="blank">Aggregate electoral targeting</a> blog post: <a href="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct.png" target="blank">Democratic vote share, by precinct in VA HOD 13</a>.</p>
<p>Even though the other plots aren&#8217;t as useful, they do provide some diagnostic information. The box and whisker plot is probably easier to read if you only cared about median vote share, and the histogram plot was excellent in finding uncontested seats. For fewer than 435 points even the scatter plots could be very useful. Please <a href="mailto:jjh@offensivepolitics.net">email me</a> or comment with ideas or alternative visualizations of vote share data.  </p>
<p><a href="http://offensivepolitics.net/data/seats-votes.r" target="blank">R code</a><br />
<a href="http://offensivepolitics.net/data/seats-votes.csv" target="blank">CSV file</a></p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/?feed=rss2&amp;p=357</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>House Data: 41k finance summaries from 2200 candidates</title>
		<link>http://offensivepolitics.net/blog/?p=344</link>
		<comments>http://offensivepolitics.net/blog/?p=344#comments</comments>
		<pubDate>Tue, 13 Jul 2010 20:01:29 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Campaign Finance]]></category>
		<category><![CDATA[Open-Source]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=344</guid>
		<description><![CDATA[I&#8217;d like to announce a new project by Offensive Politics called House Data, launching today. House Data is a large-scale extract of FEC Form 3 Summary of receipts of disbursements (pdf warning) of every US House campaign from mid-2001 onward. The traditional source for campaign finance summaries is the Candidate Summary File, which is a [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;d like to announce a new project by Offensive Politics called House Data, launching today. House Data is a large-scale extract of <a href="http://www.fec.gov/">FEC</a> <a href="http://www.fec.gov/pdf/forms/fecfrm3.pdf">Form 3 Summary of receipts of disbursements </a> (pdf warning) of every US House campaign from mid-2001 onward. </p>
<p>The traditional source for campaign finance summaries is the <a href="http://www.fec.gov/finance/disclosure/ftpsum.shtml">Candidate Summary File</a>, which is a single set of summary statistics for a campaign for an entire electoral cycle. But a campaign files a new F3 at least quarterly, and before and after every election they participate in. Each F3 provides insight into where a campaign stands, and with access to this intra-cycle data we can better compare campaigns and perform more sophisticated analysis. The House Data file is built from these F3 reports, all 41,050 reports from 2,241 candidates for the US House since 2002.  Campaigns often update previously filed reports with amendments, so the file contains only the <b>latest</b> summary provided by a campaign. </p>
<p>The file is compiled automatically using <a href="http://offensivepolitics.net/fechell/">FECHell</a> into a zipped CSV format. New releases will be made within 3 days of a new batch of electronic filings, according to the <a href="http://www.fec.gov/pdf/2010reports.pdf">FEC 2010 Filing Deadline Schedule</a> (pdf warning). </p>
<p>Here is a simple example of a quarterly summary of total receipts and total disbursements made by all house campaigns in 2008:<br />

<a href='http://offensivepolitics.net/blog/?attachment_id=348' title='2008 Cycle Total Receipts US House'><img width="150" height="150" src="http://offensivepolitics.net/blog/wp-content/uploads/2010/07/hd-ex-01a-150x150.png" class="attachment-thumbnail" alt="2008 Cycle Total Receipts US House" title="2008 Cycle Total Receipts US House" /></a>
<a href='http://offensivepolitics.net/blog/?attachment_id=347' title='2008 Cycle Total Disbursements US House'><img width="150" height="150" src="http://offensivepolitics.net/blog/wp-content/uploads/2010/07/hd-ex-01b-150x150.png" class="attachment-thumbnail" alt="2008 Cycle Total Disbursements US House" title="2008 Cycle Total Disbursements US House" /></a>
</p>
<p>The <a href="http://offensivepolitics.net/housedata/">House Data Project</a> is live today, with more examples and a data dictionary. The latest version can always be downloaded from <a href="http://offensivepolitics.net/data/housedata-latest.zip">http://offensivepolitics.net/data/housedata-latest.zip</a>. </p>
<p>If you have any questions, comments, or suggestions about the house data file please don&#8217;t hesitate to <a href="mailto:jjh@offensivepolitics.net">contact me</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/?feed=rss2&amp;p=344</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Voter targeting with R</title>
		<link>http://offensivepolitics.net/blog/?p=302</link>
		<comments>http://offensivepolitics.net/blog/?p=302#comments</comments>
		<pubDate>Wed, 26 May 2010 08:05:45 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[targeting]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=302</guid>
		<description><![CDATA[Voter targeting for turnout is the process of scoring registered voters using demographic and electoral variables taken from voter lists and commercial databases. The score of all voters together is used to predict overall turnout, which determines the allocation of campaign resources and directs strategy for voter contact and communication. Targeting for turnout is a [...]]]></description>
			<content:encoded><![CDATA[<p>Voter targeting for turnout is the process of scoring registered voters using demographic and electoral variables taken from voter lists and commercial databases. The score of all voters together is used to predict overall turnout, which determines the allocation of campaign resources and directs strategy for voter contact and communication. </p>
<p>Targeting for turnout is a three-step process: </p>
<ol>
<li>A turnout table is created for a previous election similar to the target election;</li>
<li>A scoring procedure is implemented with regression, clustering, or some other statistical process and</li>
<li>Every voter is scored with a likely turnout percentage.</li>
</ol>
<p>Depending on his or her turnout percentage &#8211; high, middling, or low &#8211; a voter will be ignored, targeted for persuasion, or targeted for get-out-the-vote (GOTV) efforts by a campaign. Targeting for turnout, along with almost every other type of political targeting, is explained in detail in <i><a href="http://www.amazon.com/Political-Targeting-Second-Hal-Malchow/dp/0615184618/" target="blank">Political Targeting</a></i> by Hal Malchow (2008). </p>
<p>In this post, I recreate parts of the regression analysis from Chapter Nine (Targeting for Turnout) of <i>Political Targeting</i> (Malchow 2008), using the free R Project for Statistical Computing. R is a programming environment that excels at data manipulation and statistical analysis, making it an interesting alternative to traditional statistical tools, like SPSS or web-based voter management software. The analysis will be performed against the full voters list from Ohio&#8217;s 1st congressional district, with the intention of predicting turnout for the 2010 congressional midterm elections. This analysis is similar or identical to what a candidate for Ohio&#8217;s 1st district would perform throughout the election year. The R code for each step in the analysis will be provided inline so a reader can perform the same operations. </p>
<h2>OH-01 Voter File</h2>
<p>A voter file is a list containing electoral and demographic data on registered voters, maintained by state boards of elections, political parties, PACs, or private companies. My voter file was downloaded from the Ohio Secretary of State in late 2009 and contains: <strong>name</strong>, <strong>address</strong>, <strong>age</strong>, <strong>registration date</strong>, <strong>voting history</strong>, and <strong>party affiliation</strong> (primary voters only). </p>
<p>To better simulate what a political campaign would use, I&#8217;ve appended the following fields: </p>
<ul>
<li><b>Gender</b>: Using birth data from the Social Security Administration, I matched each voter&#8217;s first name to a probable gender. About 9% of names were unable to be matched and coded as an empty string.</li>
<li><b>Age Group (2010)</b>: Using the birth year, I calculated age as of 2010, and then assigned each voter to an age group: 18-21,22-29,30-39, 40-49, 50-59, 60-69, 70-79, 80-89, and 90+.</li>
<li><b>Age Group (2006)</b>: Using the birth year, I calculated age as of 2006, and then assigned each voter to an age group: 18-21,22-29,30-39, 40-49, 50-59, 60-69, 70-79, 80-89, and 90+.</li>
<li><b>Household</b>: I grouped voters into discrete households using the full street address and zip code.</li>
<li><b>Marriage status</b>: Using the household variable, I performed a very simple marriage determination: people living in the same household with a difference in age < 15 years were flagged as married. </li>
<li><b> Last4 (2006)</b>: Measures participation in the last 4 major elections prior to 2006: 2004 Primary and General, and 2002 Primary and General. Range 0-4.</li>
<li><b>Last4 (2008)</b>: Measures participation in the last 4 major elections prior to 2008: 2006 Primary and General, and 2004 Primary and General. Range 0-4.</li>
<li><b>Last4 (2010)</b>: Measures participation in the last 4 major elections prior to 2010: 2008 Primary and General, 2006 Primary and General. Range 0-4.</li>
</ul>
<p>Email me <a href="mailto:jjh@offensivepoltics.net">here</a> for the code used to scrub and augment the voter file. </p>
<h2>R Setup</h2>
<p>I am using the R environment to perform this analysis. To download and install R, go to <a href="http://cran.r-project.org/" target="blank">CRAN homepage</a> and follow the instructions for your platform. Once R us up and running, execute the following to get the required libraries installed and load the voter file into memory:</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;"><span style="color: #b22222; font-style: italic;"># install plyr, ggplot2, and RColorBrewer </span>
<span style="color: #b22222; font-style: italic;"># ggplot2 loads plyr as a dependency</span>
install.packages<span style="color: #66cc66;">&#40;</span>c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;ggplot2&quot;</span>,<span style="color: #ff0000;">&quot;RColorBrewer&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>	
<span style="color: #b22222; font-style: italic;"># load required libraries (ggplot2 loads plyr as a dependency)</span>
library<span style="color: #66cc66;">&#40;</span>ggplot2<span style="color: #66cc66;">&#41;</span>
library<span style="color: #66cc66;">&#40;</span>RColorBrewer<span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># load the voter file into vfs variables</span>
vfs <span style="color: #78aaac;">&lt;-</span> read.csv<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;voterfile.csv&quot;</span><span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>Now the dependencies are installed and the voter file is read into the <b><i>vfs</i></b> variable.</p>
<h2>Turnout Table</h2>
<p>According to <i>Political Targeting</i> (Malchow 2008), the strongest indicators of participation in a future election are age and previous participation. Malchow also says  participation tends to be consistent between similar elections in different years. I am looking at the 2010 General election, a congressional midterm, so I used the 2006 General election as my guide. The first step is to generate a turnout table.</p>
<p>Political campaigns use a tool called &#8220;last 4&#8243;, which measures a voter&#8217;s recent participation. A voter&#8217;s last 4 score represents how many of the previous four elections he or she cast a ballot in. A standard is to use both the primary and general elections for the previous two major election years. My data set contains last 4 calculations for the 2010, 2008, and 2006 elections. </p>
<p>The 2006 last 4 calculation looks at elections as far back as the primary in 2002, but a percentage of voters in the list weren&#8217;t eligible or registered to vote in some or all of these elections. These voters have an incomplete last 4 score, and need to be evaluated separately so their scores don&#8217;t influence voters with a complete history. As such I created two turnout tables: one for voters eligible for all elections (<strong>full</strong>), and one for voters eligible for at least one of the last four elections (<strong>partial</strong>). The turnout tables below show a turnout percentage for every combination of age group and participation score for 2006 voters:</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;"><span style="color: #b22222; font-style: italic;"># find voters registered before the 2002 primary</span>
ele.full <span style="color: #78aaac;">&lt;-</span> which<span style="color: #66cc66;">&#40;</span>vfs$reg.date <span style="color: #78aaac;">&lt;=</span> '<span style="color: #cc66cc;">2002</span><span style="color: #78aaac;">-</span>05<span style="color: #78aaac;">-</span>07'<span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># find voters registered after the 2002 primary but before the 2006 general</span>
ele.partial <span style="color: #78aaac;">&lt;-</span> which<span style="color: #66cc66;">&#40;</span>vfs$reg.date <span style="color: #78aaac;">&gt;</span> '<span style="color: #cc66cc;">2002</span><span style="color: #78aaac;">-</span>05<span style="color: #78aaac;">-</span>07' <span style="color: #78aaac;">&amp;</span> vfs$reg.date <span style="color: #78aaac;">&lt;=</span> '<span style="color: #cc66cc;">2006</span><span style="color: #78aaac;">-</span><span style="color: #cc66cc;">11</span><span style="color: #78aaac;">-</span>07'<span style="color: #66cc66;">&#41;</span>	
<span style="color: #b22222; font-style: italic;"># show the turnout table for full eligible voters</span>
turnout.full <span style="color: #78aaac;">&lt;-</span> ddply<span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.full,<span style="color: #66cc66;">&#93;</span>,c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;age.2006&quot;</span>,<span style="color: #ff0000;">&quot;last4.g2006&quot;</span><span style="color: #66cc66;">&#41;</span>, <span style="color: #a020f0;">function</span><span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> length<span style="color: #66cc66;">&#40;</span>which<span style="color: #66cc66;">&#40;</span>x$turnout.g06 <span style="color: #78aaac;">==</span> <span style="color: #ff0000;">&quot;X&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">/</span> nrow<span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span>	
<span style="color: #b22222; font-style: italic;"># show the turnout table for partial voters</span>
turnout.partial <span style="color: #78aaac;">&lt;-</span> ddply<span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.partial,<span style="color: #66cc66;">&#93;</span>,c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;age.2006&quot;</span>,<span style="color: #ff0000;">&quot;last4.g2006&quot;</span><span style="color: #66cc66;">&#41;</span>, <span style="color: #a020f0;">function</span><span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> sum<span style="color: #66cc66;">&#40;</span>x$turnout.g06 <span style="color: #78aaac;">==</span> <span style="color: #ff0000;">&quot;X&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">/</span> nrow<span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>Turnout percentage for <b><i>turnout.full</i>:</b><br />
<TABLE border=0><br />
<TR>
<th></th>
<p><TH> Age group </TH> <TH> 18-21 </TH> <TH> 22-29 </TH> <TH> 30-39 </TH> <TH> 40-49 </TH> <TH> 50-59 </TH> <TH> 60-69 </TH> <TH> 70-79 </TH> <TH> 80-89 </TH> <TH> 90+ </TH>  </TR></p>
<tr>
<td><b>last4</b></td>
<td><bold>0</bold></td>
<td align="right">0</td>
<td align="right">8.59</td>
<td align="right">10.06</td>
<td align="right">11.82</td>
<td align="right">13.31</td>
<td align="right">13.64</td>
<td align="right">11.23</td>
<td align="right">8</td>
<td align="right">5.76</td>
</tr>
<tr>
<td></td>
<td><bold>1</bold></td>
<td align="right">0</td>
<td align="right">32.63</td>
<td align="right">43.84</td>
<td align="right">50.86</td>
<td align="right">54.71</td>
<td align="right">54.92</td>
<td align="right">49.82</td>
<td align="right">36.08</td>
<td align="right">21.98</td>
</tr>
<tr>
<td></td>
<td><bold>2</bold></td>
<td align="right">100</td>
<td align="right">60.14</td>
<td align="right">72.32</td>
<td align="right">79.61</td>
<td align="right">82.32</td>
<td align="right">83.8</td>
<td align="right">79.95</td>
<td align="right">69.28</td>
<td align="right">55.86</td>
</tr>
<tr>
<td></td>
<td><bold>3</bold></td>
<td align="right">72.58</td>
<td align="right">84.35</td>
<td align="right">88.96</td>
<td align="right">91.29</td>
<td align="right">92.24</td>
<td align="right">90.02</td>
<td align="right">82.74</td>
<td align="right">65.23</td>
<td align="right">100</td>
</tr>
<tr>
<td></td>
<td><bold>4</bold></td>
<td align="right">100</td>
<td align="right">78.43</td>
<td align="right">88.88</td>
<td align="right">93.43</td>
<td align="right">94.94</td>
<td align="right">95.66</td>
<td align="right">94.66</td>
<td align="right">90.25</td>
<td align="right">79.41</td>
</tr>
</table>
<p>Turnout percentage for <b><i>turnout.partial</i>:</b><br />
<TABLE border=0><br />
<TR>  <TH>  </TH><TH> Age Group </TH> <TH> 18-21 </TH> <TH> 22-29 </TH> <TH> 30-39 </TH> <TH> 40-49 </TH> <TH> 50-59 </TH> <TH> 60-69 </TH> <TH> 70-79 </TH> <TH> 80-89 </TH> <TH> 90+ </TH>  </TR></p>
<tr>
<td><b>last4</b></td>
<td><bold>0</bold>
<td align="right">21.38</td>
<td align="right">13.07</td>
<td align="right">15.85</td>
<td align="right">16.97</td>
<td align="right">19.63</td>
<td align="right">27.97</td>
<td align="right">29.8</td>
<td align="right">36.08</td>
<td align="right">29.41</td>
</tr>
<tr>
<td></td>
<td><bold>1</bold>
<td align="right">27.7</td>
<td align="right">27.38</td>
<td align="right">33.96</td>
<td align="right">36.38</td>
<td align="right">41.7</td>
<td align="right">51.69</td>
<td align="right">51.95</td>
<td align="right">42.25</td>
<td align="right">28.57</td>
</tr>
<tr>
<td></td>
<td><bold>2</bold>
<td align="right">52.66</td>
<td align="right">55.3</td>
<td align="right">66.13</td>
<td align="right">66.48</td>
<td align="right">72.99</td>
<td align="right">79.05</td>
<td align="right">73.83</td>
<td align="right">70</td>
<td align="right">20</td>
</tr>
<tr>
<td></td>
<td><bold>3</bold>
<td align="right">64.14</td>
<td align="right">78.63</td>
<td align="right">79.46</td>
<td align="right">91.51</td>
<td align="right">88.37</td>
<td align="right">85.29</td>
<td align="right">94.74</td>
<td align="right">100</td>
<td align="right">0</td>
</tr>
<tr>
<td></td>
<td><bold>4</bold>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
</tr>
</table>
<p>For each table, we see that turnout percentage increases as previous participation increases for every age group, but it is pretty difficult to compare more than two age groups at once using this table. There are also several anomalous groups with 100% turnout, indicating a small population in that group. We&#8217;ll use the R library <b>ggplot2</b> to create a simple visualization of each table to help interpret the turnout values:</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;"><span style="color: #b22222; font-style: italic;"># turnout-full visualization</span>
qplot<span style="color: #66cc66;">&#40;</span>last4.g2006,V1,color<span style="color: #78aaac;">=</span>age.2006,group<span style="color: #78aaac;">=</span>age.2006,data<span style="color: #78aaac;">=</span>turnout.full,geom<span style="color: #78aaac;">=</span>c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;point&quot;</span>,<span style="color: #ff0000;">&quot;line&quot;</span><span style="color: #66cc66;">&#41;</span>,
  main<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;OH-01 2006 General Turnout by Age Group, Last 4 (Full)&quot;</span>,xlab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Last 4&quot;</span>,ylab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Turnout %&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  scale_colour_hue<span style="color: #66cc66;">&#40;</span>name<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Age Group&quot;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># turnout-partial visualization</span>
qplot<span style="color: #66cc66;">&#40;</span>last4.g2006,V1,color<span style="color: #78aaac;">=</span>age.2006,group<span style="color: #78aaac;">=</span>age.2006,data<span style="color: #78aaac;">=</span>turnout.partial,geom<span style="color: #78aaac;">=</span>c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;point&quot;</span>,<span style="color: #ff0000;">&quot;line&quot;</span><span style="color: #66cc66;">&#41;</span>,
  main<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;OH-01 2006 General Turnout by Age Group, Last 4 (Partial)&quot;</span>,xlab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Last 4&quot;</span>,ylab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Turnout %&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  scale_colour_hue<span style="color: #66cc66;">&#40;</span>name<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Age Group&quot;</span><span style="color: #66cc66;">&#41;</span></pre></div></div>

<table rows="1" cols="2">
<tr>
<td align="center">
<div id="attachment_314" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01-2006-turnout-age-last4-full.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01-2006-turnout-age-last4-full-300x299.png" alt="Figure 1: 2006 Turnout Percentage by Age as a Function of Previous Participation (full)" title="Figure 1: 2006 Turnout Percentage by Age as a Function of Previous Participation (full)" width="300" height="299" class="size-medium wp-image-314" /></a><p class="wp-caption-text">Figure 1: 2006 Turnout Percentage by Age as a Function of Previous Participation (full)</p></div>
</td>
<td align="center">
<div id="attachment_315" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01-2006-turnout-age-last4-partial.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01-2006-turnout-age-last4-partial-300x299.png" alt="Figure 2: 2006 Turnout Percentage by Age as a Function of Previous Participation (partial)" title="Figure 2: 2006 Turnout Percentage by Age as a Function of Previous Participation (partial)" width="300" height="299" class="size-medium wp-image-315" /></a><p class="wp-caption-text">Figure 2: 2006 Turnout Percentage by Age as a Function of Previous Participation (partial)</p></div>
</td>
</tr>
</table>
<p>Figure 1 shows turnout for voters with a complete last 4 score, and tells us that for all age groups except 18-21, turnout increases with previous participation, until turnout reaches a maximum of 85%-90%. The rate at which turnout increases is similar between age groups, suggesting previous participation may have more predictive value than age. Figure 2 is a representation of all voters who were registered in time for the the 2006 general but not for the 2002 primary. Figure 2 shows a relationship between turnout and previous participation, but there is substantially more noise than in Figure 1. Taken together we can verify the hypothesis put forth by Malchow that age and previous participation seem to have a positive influence on future participation. </p>
<h2>Regression analysis</h2>
<p>The 2006 turnout tables are useful but they don&#8217;t represent a formal model of turnout the 2006 election. A formal model will measure the interactions between the predictor variables (participation &#038; age) and the intended outcome (turnout) of 2006 voters. This model can be applied to 2010 voters to project turnout. </p>
<p>The model can include the other voter file variables with potential predictive qualities like gender, party affiliation, and martial status. A campaign will traditionally build a linear regression model to project turnout, but linear regression doesn&#8217;t support categorical variables and can produce values that don&#8217;t make sense for turnout, so I won&#8217;t be using that type of regression here. </p>
<p>Instead, I&#8217;ll use a generalized linear model to perform a binomial regression with a logit link function (logistic regression). Logistic regression estimates a binary variable given an intercept and a number of independent continuous or categorical predictor variables. R has terrific support for defining and evaluating these models using the base <b>glm</b> package.</p>
<p>The goal is to fit a logistic regression on voter data from 2006, and then use that regression to project turnout for 2010. I actually create two regressions, one for voters with at least a 4-year voting eligibility (full model), and one for all other voters (partial model). This is identical to the segmentation used when creating the turnout tables. The output of these regressions is the probability that a voter will turn out in the given year. A campaign can use this figure to estimate total turnout in an election, and to allocate resources to different geographic and demographic segments.</p>
<p>The R function <b>glm</b> is used to create two models of 2006 turnout based on last 4 participation, age group, gender, party affiliation, and martial status.</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;"><span style="color: #b22222; font-style: italic;"># create temporary variables inside the data frame for 2006 values</span>
vfs$last4 <span style="color: #78aaac;">&lt;-</span> vfs$last4.g2006
vfs$age <span style="color: #78aaac;">&lt;-</span> vfs$age.2006
<span style="color: #b22222; font-style: italic;"># create a model for voters with at least 4 years of voting history</span>
full.lr <span style="color: #78aaac;">&lt;-</span> glm<span style="color: #66cc66;">&#40;</span>turnout.g06 ~ last4 <span style="color: #78aaac;">+</span> age <span style="color: #78aaac;">+</span> gender<span style="color: #78aaac;">+</span>party<span style="color: #78aaac;">+</span>married,data<span style="color: #78aaac;">=</span>vfs<span style="color: #66cc66;">&#91;</span>ele.full,<span style="color: #66cc66;">&#93;</span>,family<span style="color: #78aaac;">=</span>binomial<span style="color: #66cc66;">&#41;</span>	
<span style="color: #b22222; font-style: italic;"># run ANOVA against the full table to test for term significance </span>
anova<span style="color: #66cc66;">&#40;</span>full.lr,test<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Chisq&quot;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># create a model for voters with less than 4 years of voting history</span>
partial.lr <span style="color: #78aaac;">&lt;-</span> glm<span style="color: #66cc66;">&#40;</span>turnout.g06 ~ last4 <span style="color: #78aaac;">+</span> age <span style="color: #78aaac;">+</span> gender<span style="color: #78aaac;">+</span>party<span style="color: #78aaac;">+</span>married,data<span style="color: #78aaac;">=</span>vfs<span style="color: #66cc66;">&#91;</span>ele.partial,<span style="color: #66cc66;">&#93;</span>,family<span style="color: #78aaac;">=</span>binomial<span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;">#  run ANOVA against the partial table to test for term significance </span>
anova<span style="color: #66cc66;">&#40;</span>partial.lr,test<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Chisq&quot;</span><span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>Now that I have fitted models, I&#8217;ll use the <b>predict</b> function to capture the model output. The output is the likelihood that a voter turned out in 2006 given his last4.2006 score, age group, gender, martial status, and party affiliation. Then I can compare the predicted turnout probability with the actual turnout to determine the effectiveness of each model. This isn&#8217;t a valid statistical measure of accuracy but merely a smell test.</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;"><span style="color: #b22222; font-style: italic;"># create a new column in the vfs data frame</span>
vfs$pred.g06 <span style="color: #78aaac;">&lt;-</span> c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span>
pred.g06.full <span style="color: #78aaac;">&lt;-</span> predict<span style="color: #66cc66;">&#40;</span>full.lr,type<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;response&quot;</span><span style="color: #66cc66;">&#41;</span>
pred.g06.partial <span style="color: #78aaac;">&lt;-</span> predict<span style="color: #66cc66;">&#40;</span>partial.lr,type<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;response&quot;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># apply the full model to voters with at least 4 years of registration</span>
vfs<span style="color: #66cc66;">&#91;</span>names<span style="color: #66cc66;">&#40;</span>pred.g06.full<span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>$pred.g06 <span style="color: #78aaac;">&lt;-</span> pred.g06.full
<span style="color: #b22222; font-style: italic;"># apply the partial model to voters with less than 4 years of registration</span>
vfs<span style="color: #66cc66;">&#91;</span>names<span style="color: #66cc66;">&#40;</span>pred.g06.partial<span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>$pred.g06 <span style="color: #78aaac;">&lt;-</span> pred.g06.partial
<span style="color: #b22222; font-style: italic;"># take the number of correct predictions divided by the number of voters  </span>
full.correct <span style="color: #78aaac;">&lt;-</span> sum<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.full,<span style="color: #66cc66;">&#93;</span>$pred.g06 <span style="color: #78aaac;">&gt;</span> .5<span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">==</span> <span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.full,<span style="color: #66cc66;">&#93;</span>$turnout.g06 <span style="color: #78aaac;">==</span> <span style="color: #ff0000;">&quot;X&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">/</span> nrow<span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.full,<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># value is .797 = ~80% accurate for the full model</span>
<span style="color: #b22222; font-style: italic;"># take the same for the partial model</span>
partial.correct <span style="color: #78aaac;">&lt;-</span> sum<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.partial,<span style="color: #66cc66;">&#93;</span>$pred.g06 <span style="color: #78aaac;">&gt;</span> .5<span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">==</span> <span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.partial,<span style="color: #66cc66;">&#93;</span>$turnout.g06 <span style="color: #78aaac;">==</span> <span style="color: #ff0000;">&quot;X&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">/</span> nrow<span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.partial,<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># value is .776 = ~77% accurate for the partial model</span></pre></div></div>

<p>The prediction rates for our regressions aren&#8217;t spectacular: 80% for the full model and 77% for the partial model. Given the limited information in our voter file, though, they aren&#8217;t that bad. Additionally, a political campaign would have access to other data like detailed demographics, financial data, and more accurate lifestyle or ideological information. Extending the regression with these variables might increase the predictive power of the system. </p>
<h2>2010</h2>
<p>Now I&#8217;ll apply the regression equations to project turnout in 2010. First, I determine which regression (partial or full) to apply to current voters by their registration date:</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;"><span style="color: #b22222; font-style: italic;"># find voters registered before the 2006 primary (328594 voters)</span>
ele.full2010 <span style="color: #78aaac;">&lt;-</span> which<span style="color: #66cc66;">&#40;</span>vfs$reg.date <span style="color: #78aaac;">&lt;=</span> '<span style="color: #cc66cc;">2006</span><span style="color: #78aaac;">-</span>05<span style="color: #78aaac;">-</span>02'<span style="color: #66cc66;">&#41;</span>	
<span style="color: #b22222; font-style: italic;"># find voters registered after the 2006 primary but before the 2008 general (68863 voters)</span>
ele.partial2010 <span style="color: #78aaac;">&lt;-</span> which<span style="color: #66cc66;">&#40;</span>vfs$reg.date <span style="color: #78aaac;">&gt;</span> '<span style="color: #cc66cc;">2006</span><span style="color: #78aaac;">-</span>05<span style="color: #78aaac;">-</span>02' <span style="color: #78aaac;">&amp;</span> vfs$reg.date <span style="color: #78aaac;">&lt;=</span> '<span style="color: #cc66cc;">2008</span><span style="color: #78aaac;">-</span><span style="color: #cc66cc;">11</span><span style="color: #78aaac;">-</span>04'<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>Next I prepare the data and project 2010 turnout for each model using the <b>predict</b> function:</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;"><span style="color: #b22222; font-style: italic;"># assign the last4 and age model variables to values calculated for 2010</span>
vfs$age <span style="color: #78aaac;">&lt;-</span> vfs$age.2010
vfs$last4 <span style="color: #78aaac;">&lt;-</span> vfs$last4.2010	
<span style="color: #b22222; font-style: italic;"># call predict for the full model 	</span>
pred.g10.full <span style="color: #78aaac;">&lt;-</span> predict<span style="color: #66cc66;">&#40;</span>full.lr,newdata<span style="color: #78aaac;">=</span>vfs<span style="color: #66cc66;">&#91;</span>ele.full2010,<span style="color: #66cc66;">&#93;</span>,type<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;response&quot;</span><span style="color: #66cc66;">&#41;</span>	
<span style="color: #b22222; font-style: italic;"># predict based on the partial model</span>
pred.g10.partial <span style="color: #78aaac;">&lt;-</span> predict<span style="color: #66cc66;">&#40;</span>partial.lr,newdata<span style="color: #78aaac;">=</span>vfs<span style="color: #66cc66;">&#91;</span>ele.partial2010,<span style="color: #66cc66;">&#93;</span>,type<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;response&quot;</span><span style="color: #66cc66;">&#41;</span>		
<span style="color: #b22222; font-style: italic;"># turnout % for 2010 </span>
pred.g10.turnout.full <span style="color: #78aaac;">&lt;-</span> sum<span style="color: #66cc66;">&#40;</span>pred.g10.full <span style="color: #78aaac;">&gt;</span> .5<span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">/</span> length<span style="color: #66cc66;">&#40;</span>ele.full2010<span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># 63% predicted turnout</span>
pred.g10.turnout.partial <span style="color: #78aaac;">&lt;-</span> sum<span style="color: #66cc66;">&#40;</span>pred.g10.partial <span style="color: #78aaac;">&gt;</span> .5<span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">/</span> length<span style="color: #66cc66;">&#40;</span>ele.partial2010<span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># 18% predicted turnout	</span>
<span style="color: #b22222; font-style: italic;"># save the predictions into the vfs data frame</span>
vfs$pred.g10 <span style="color: #78aaac;">&lt;-</span> c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span>
vfs<span style="color: #66cc66;">&#91;</span>names<span style="color: #66cc66;">&#40;</span>pred.g10.full<span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>$pred.g10 <span style="color: #78aaac;">&lt;-</span> pred.g10.turnout.full
vfs<span style="color: #66cc66;">&#91;</span>names<span style="color: #66cc66;">&#40;</span>pred.g10.partial<span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>$pred.g10 <span style="color: #78aaac;">&lt;-</span> pred.g10.turnout.partial</pre></div></div>

<p>According to <b><i>pred.g10.full</i></b> and <b><i>pred.g10.partial</i></b>, OH-01 will see 63% overall turnout for voters from the full model and 18% overall turnout for voters from the partial model. To determine the validity of the 2010 projections, I plotted 2006 actual turnout against the 2010 projected turnout for every age group. As stated in the introduction, Malchow says the turnout rates for 2010 should be similar to the 2006 election, so I expect no large unexplainable deviations in the chart. A separate chart is created for each participation model (full, partial):</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;"><span style="color: #b22222; font-style: italic;"># which voters have a &gt; 50% chance of turning out in 2010</span>
turnout.g10 <span style="color: #78aaac;">&lt;-</span> vfs$pred.g10 <span style="color: #78aaac;">&gt;</span> .5	
<span style="color: #b22222; font-style: italic;"># which voters turned out in 2006</span>
turnout.g06 <span style="color: #78aaac;">&lt;-</span> vfs$turnout.g06 <span style="color: #78aaac;">==</span> <span style="color: #ff0000;">&quot;X&quot;</span>
<span style="color: #b22222; font-style: italic;"># build a summary of voters who turned out in 06 or 10 based on age</span>
to <span style="color: #78aaac;">&lt;-</span> rbind<span style="color: #66cc66;">&#40;</span> ddply<span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.full,<span style="color: #66cc66;">&#93;</span>,<span style="color: #ff0000;">&quot;age.2006&quot;</span>,<span style="color: #a020f0;">function</span><span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> data.frame<span style="color: #66cc66;">&#40;</span>age<span style="color: #78aaac;">=</span>x$age.2006<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#93;</span>,n<span style="color: #78aaac;">=</span>nrow<span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span>,series<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;G06&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#91;</span>,c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span>:<span style="color: #cc66cc;">4</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span>,
                ddply<span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.full2010,<span style="color: #66cc66;">&#93;</span>,<span style="color: #ff0000;">&quot;age.2010&quot;</span>,<span style="color: #a020f0;">function</span><span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> data.frame<span style="color: #66cc66;">&#40;</span>age<span style="color: #78aaac;">=</span>x$age.2010<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#93;</span>,n<span style="color: #78aaac;">=</span>nrow<span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span>,series<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;G10&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#91;</span>,c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span>:<span style="color: #cc66cc;">4</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>					
qplot<span style="color: #66cc66;">&#40;</span>x<span style="color: #78aaac;">=</span>age,y<span style="color: #78aaac;">=</span>n,data<span style="color: #78aaac;">=</span>to,fill<span style="color: #78aaac;">=</span>series,stat<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;identity&quot;</span>,geom<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;bar&quot;</span>,position<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;dodge&quot;</span>,main<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;OH-01 2006 Turnout vs 2010 Projected Turnout (full)&quot;</span>,xlab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Age&quot;</span>, ylab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Count&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span>  
  scale_fill_brewer<span style="color: #66cc66;">&#40;</span>pal<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Paired&quot;</span>,<span style="color: #ff0000;">&quot;Election&quot;</span><span style="color: #66cc66;">&#41;</span>
&nbsp;
to <span style="color: #78aaac;">&lt;-</span> rbind<span style="color: #66cc66;">&#40;</span> ddply<span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.partial,<span style="color: #66cc66;">&#93;</span>,<span style="color: #ff0000;">&quot;age.2006&quot;</span>,<span style="color: #a020f0;">function</span><span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> data.frame<span style="color: #66cc66;">&#40;</span>age<span style="color: #78aaac;">=</span>x$age.2006<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#93;</span>,n<span style="color: #78aaac;">=</span>nrow<span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span>,series<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;G06&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#91;</span>,c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span>:<span style="color: #cc66cc;">4</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span>,
		 ddply<span style="color: #66cc66;">&#40;</span>vfs<span style="color: #66cc66;">&#91;</span>ele.partial2010,<span style="color: #66cc66;">&#93;</span>,<span style="color: #ff0000;">&quot;age.2010&quot;</span>,<span style="color: #a020f0;">function</span><span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> data.frame<span style="color: #66cc66;">&#40;</span>age<span style="color: #78aaac;">=</span>x$age.2010<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#93;</span>,n<span style="color: #78aaac;">=</span>nrow<span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span>,series<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;G10&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#91;</span>,c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span>:<span style="color: #cc66cc;">4</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>			
qplot<span style="color: #66cc66;">&#40;</span>x<span style="color: #78aaac;">=</span>age,y<span style="color: #78aaac;">=</span>n,data<span style="color: #78aaac;">=</span>to,fill<span style="color: #78aaac;">=</span>series,stat<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;identity&quot;</span>,geom<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;bar&quot;</span>,position<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;dodge&quot;</span>,main<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;OH-01 2006 Turnout vs 2010 Projected Turnout (partial)&quot;</span>,xlab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Age&quot;</span>, 
  ylab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Count&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> scale_fill_brewer<span style="color: #66cc66;">&#40;</span>pal<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Paired&quot;</span>,<span style="color: #ff0000;">&quot;Election&quot;</span><span style="color: #66cc66;">&#41;</span></pre></div></div>

<table rows="1" cols="2">
<tr>
<td align="center"><div id="attachment_319" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01_2006v2010-full.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01_2006v2010-full-300x299.png" alt="Figure 3: OH-01 2006 Actual Turnout and 2010 Projected Turnout (full)" title="Figure 3: OH-01 2006 Actual Turnout and 2010 Projected Turnout (full)" width="300" height="299" class="size-medium wp-image-319" /></a><p class="wp-caption-text">Figure 3: OH-01 2006 Actual Turnout and 2010 Projected Turnout (full)</p></div>
</td>
<td align="center"><div id="attachment_320" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01_2006v2010-partial.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01_2006v2010-partial-300x299.png" alt="Figure 4: OH-01 2006 Actual Turnout and 2010 Projected Turnout (partial)" title="Figure 4: OH-01 2006 Actual Turnout and 2010 Projected Turnout (partial)" width="300" height="299" class="size-medium wp-image-320" /></a><p class="wp-caption-text">Figure 4: OH-01 2006 Actual Turnout and 2010 Projected Turnout (partial)</p></div>
</td>
</tr>
</table>
<p>Figure 3 suggests larger 2010 turnout for all groups as compared to 2006. The projected increase in the 22-29 age group seems unlikely, but can probably be explained by the higher turnout among younger voters in 2008. In addition to 2008 being a presidential election year, the Obama for America campaign focused on registering new voters and activating dormant voters, both of which increased turnout among younger voters. That higher 2008 turnout inflated the last 4 measure for new voters, which pushed up the projected 2010 turnout. Figure 4 exhibits a similar projected increase for the younger age groups, which is probably due to the same increase in 2008 turnout. Before using these models in an actual election, the projections would need to be scaled based on some other turnout estimate. Despite the inflated values, however, this is a strong system for turnout prediction and could be used by almost any congressional campaign. </p>
<h2>Model Improvements</h2>
<p>In addition to scaling, the regression models would need to be improved in several ways before being put into production. The predictor variables are currently considered independently, which effectively discounts any interactive effects that may exist. Turnout for younger married females or unmarried democrats may be better modeled using compound variables, for example. Also, the model makes no use of demographic or opinion survey information available to political campaigns. Finally, the projection isn&#8217;t limited to two regressions; a campaign could create regressions by county or school district, or based on marriage status, or any other combination. </p>
<h2>Other Visualization Examples</h2>
<p>While not specifically related to turnout, I produced several simple visualizations that explore the rest of the voter file. The full power of R can be applied using the same voter data from the turnout projections.</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;"><span style="color: #b22222; font-style: italic;">## precinct summary</span>
<span style="color: #b22222; font-style: italic;"># summarize turnout in 2008 &amp; registered democrats, by precinct</span>
pct <span style="color: #78aaac;">&lt;-</span> ddply<span style="color: #66cc66;">&#40;</span>vfs,<span style="color: #ff0000;">&quot;precinct.code&quot;</span>,<span style="color: #a020f0;">function</span><span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> data.frame<span style="color: #66cc66;">&#40;</span>turnout.g08<span style="color: #78aaac;">=</span>sum<span style="color: #66cc66;">&#40;</span>x$turnout.g08 <span style="color: #78aaac;">==</span> <span style="color: #ff0000;">&quot;X&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">/</span> nrow<span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span>,dem.pct<span style="color: #78aaac;">=</span>sum<span style="color: #66cc66;">&#40;</span>x$party<span style="color: #78aaac;">==</span><span style="color: #ff0000;">&quot;D&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #78aaac;">/</span>nrow<span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span>,nvoters<span style="color: #78aaac;">=</span>nrow<span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># visualize</span>
qplot<span style="color: #66cc66;">&#40;</span>turnout.g08<span style="color: #78aaac;">*</span><span style="color: #cc66cc;">100</span>,dem.pct<span style="color: #78aaac;">*</span><span style="color: #cc66cc;">100</span>,data<span style="color: #78aaac;">=</span>pct,geom<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;point&quot;</span>,size<span style="color: #78aaac;">=</span>nvoters,alpha<span style="color: #78aaac;">=</span>I<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0.4</span><span style="color: #66cc66;">&#41;</span>,main<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;OH-01 Precinct Turnout/Registration Summary&quot;</span>,
  xlab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Registered Democrats (%)&quot;</span>,ylab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;2008 General Turnout&quot;</span><span style="color: #66cc66;">&#41;</span>
&nbsp;
<span style="color: #b22222; font-style: italic;">## 2008 turnout by gender + age</span>
qplot<span style="color: #66cc66;">&#40;</span>age.2010,data<span style="color: #78aaac;">=</span>vfs<span style="color: #66cc66;">&#91;</span>which<span style="color: #66cc66;">&#40;</span>vfs$turnout.g08 <span style="color: #78aaac;">==</span> <span style="color: #ff0000;">&quot;X&quot;</span><span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>,geom<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;bar&quot;</span>,fill<span style="color: #78aaac;">=</span>gender,position<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;dodge&quot;</span>,main<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;OH-01 2008 General Turnout by Gender, Age&quot;</span>,xlab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Age&quot;</span>,   
  ylab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Count&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> scale_fill_brewer<span style="color: #66cc66;">&#40;</span>pal<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Set1&quot;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;">## 2008 newly registered voter counts by age</span>
qplot<span style="color: #66cc66;">&#40;</span>age.2010,data<span style="color: #78aaac;">=</span>new.08,main<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;OH-01 2008 Newly registered voters&quot;</span>,xlab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Age&quot;</span>,ylab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Count&quot;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;">## 2008 newly registered voter turnout by age</span>
qplot<span style="color: #66cc66;">&#40;</span>age.2010,data<span style="color: #78aaac;">=</span>new.08,fill<span style="color: #78aaac;">=</span>turnout.g08,position<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;dodge&quot;</span>,main<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;OH-01 2008 Turnout for newly registered voters&quot;</span>,xlab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Age&quot;</span>,ylab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Count&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  scale_fill_brewer<span style="color: #66cc66;">&#40;</span>pal<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Paired&quot;</span><span style="color: #66cc66;">&#41;</span></pre></div></div>

<table rows="2" cols="2">
<tr>
<td align="center"><div id="attachment_313" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01_precinct_summary.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01_precinct_summary-300x299.png" alt="Figure 5: OH-01 2010 Precinct Summary" title="Figure 5: OH-01 2010 Precinct Summary" width="300" height="299" class="size-medium wp-image-313" /></a><p class="wp-caption-text">Figure 5: OH-01 2010 Precinct Summary</p></div></td>
<td align="center"><div id="attachment_318" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01-2008-turnout-gender-age.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01-2008-turnout-gender-age-300x299.png" alt="Figure 6: OH-01 2008 General Turnout by Gender, Age" title="Figure 6: OH-01 2008 General Turnout by Gender, Age" width="300" height="299" class="size-medium wp-image-318" /></a><p class="wp-caption-text">Figure 6: OH-01 2008 General Turnout by Gender, Age</p></div></td>
</tr>
<tr>
<td align="center"><div id="attachment_317" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01-2008-newly-registered-voters-by-age.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01-2008-newly-registered-voters-by-age-300x299.png" alt="Figure 7: OH-01 2008 Newly Registered Voters by Age" title="Figure 7: OH-01 2008 Newly Registered Voters by Age" width="300" height="299" class="size-medium wp-image-317" /></a><p class="wp-caption-text">Figure 7: OH-01 2008 Newly Registered Voters by Age</p></div></td>
<td align="center"><div id="attachment_316" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01-2008-newly-registered-age-turnout.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/05/oh01-2008-newly-registered-age-turnout-300x299.png" alt="Figure 8: OH-01 2008 Newly Registered Voters by Age, Turnout" title="Figure 8: OH-01 2008 Newly Registered Voters by Age, Turnout" width="300" height="299" class="size-medium wp-image-316" /></a><p class="wp-caption-text">Figure 8: OH-01 2008 Newly Registered Voters by Age, Turnout</p></div></td>
</tr>
</table>
<p>Figure 5 shows the 2008 Democratic turnout percentage and 2008 general turnout percentage for each precinct, and each bubble is scaled to the registered voter population of the precinct it represents. Figure 6 shows 2008 general turnout by gender and age group. Figure 7 is the raw count of voters registered between the day after the 2006 general election and the day of the 2008 general election, broken down by age group. Figure 8 is a variation of Figure 7, showing 2008 turnout of voters registered after the 2006 election. None of these charts took more than 5 minutes to create from concept to output, and <b>ggplot</b> did almost all of the heavy lifting. I believe the ease with which these charts were built highlights the utility of having your data analysis tool also be your visualization tool. </p>
<h2>Summary</h2>
<p>In this short example I&#8217;ve analyzed, visualized, and modeled electoral data using R and a few add-on packages. These are standard techniques used by any congressional campaign, but they are usually performed by some by combination of Excel, SPSS, or SQL. By using R, I avoided the compatibility issues usually encountered when transferring data between tools. R would have also allowed me to perform clustering, component analysis, or Bayesian inference on the same data from the same R interface. All together, these reasons make R a good addition to the political analysis toolbox for a campaign or campaign consultant. If you would like to discuss how advanced statistical analysis can help your Democratic campaign model turnout, increase fund-raising, or benchmark field operations, please don&#8217;t hesitate to contact <a href="mailto:jjh@offensivepolitics.net">me</a>.</p>
<p>Click <a href="http://offensivepolitics.net/data/voterfile.zip">here</a> (14MB) to download the R scripts and data associated with this post. The voter file data has been scrubbed to remove the VoterID, name, and address components. </p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/?feed=rss2&amp;p=302</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New FECHell 0.1.9</title>
		<link>http://offensivepolitics.net/blog/?p=299</link>
		<comments>http://offensivepolitics.net/blog/?p=299#comments</comments>
		<pubDate>Mon, 22 Mar 2010 14:17:14 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Open-Source]]></category>
		<category><![CDATA[Campaign Finance]]></category>
		<category><![CDATA[fechell]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=299</guid>
		<description><![CDATA[Our FEC report file library FECHell has been updated to 0.1.9. The release includes a half dozen bug fixes and the following new features: Speed improvements &#8211; Schedule and field names names are matched by compiled regular expressions instead of brute-force string matching, resulting in a ~25% speed increase for large files. DEF file fixes [...]]]></description>
			<content:encoded><![CDATA[<p>Our FEC report file library <a href="http://offensivepolitics.net/fechell/">FECHell</a> has been updated to 0.1.9. The release includes a half dozen bug fixes and the following new features: </p>
<ul>
<li><strong>Speed improvements</strong> &#8211; Schedule and field names names are matched by compiled regular expressions instead of brute-force string matching, resulting in a ~25% speed increase for large files.</li>
<li><strong>DEF file fixes</strong> &#8211; The field definition files FECHell uses have been heavily updated to remove bad fields and standardize field names for most schedules</li>
<li><strong>FECForm library</strong> &#8211; FECHell now includes an optional library called FECForm. FECHell returns an hash indexed by field names. This is problematic because field names are case sensitive, version dependent, and some field names are difficult to understand. FECForm tries to alleviate some of these issues by providing a consistent interface for one form (F3) and several schedules (SA,SB,SC,SC1) using named parameters. This is especially helpful when parsing very old FEC files where name and address components were delimited by a ^ instead of being an individual field, COMMITTEE was called CMTTE, etc.<br />
Example:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;">fec_version, original_form_type, form_type, values = h.<span style="color:#9900CC;">header_lines</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;12345.fec&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#008000; font-style:italic;"># old way 5.2-6.4</span>
<span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Committee: #{values['COMMITTEE NAME']}&quot;</span>
<span style="color:#008000; font-style:italic;"># old way 3.0-5.1</span>
<span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;Committee: #{values['FILER NAME']}&quot;</span>
<span style="color:#008000; font-style:italic;"># new way</span>
f = FECForm.<span style="color:#9900CC;">schedule_for</span><span style="color:#006600; font-weight:bold;">&#40;</span>form_type, fec_version, values<span style="color:#006600; font-weight:bold;">&#41;</span>
<span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;For: #{f.committee_name}&quot;</span></pre></div></div>

<p> Other major forms (F3X,F1,F2,F6) are in progress, as are the outstanding schedules (SC2).</li>
<li><strong>Monolithic unit tests</strong> &#8211; FECHell / FECForms finally comes with some unit tests! An example file for every form and every schedule, for every supported version, is included in the distribution. Tests are included for every form and field supported by FECForms. </li>
</ul>
<p>Visit the <a href="http://offensivepolitics.net/fechell/">FECHell</a> page for more information, installation instructions, and more examples. </p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/?feed=rss2&amp;p=299</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Re-mapping Massachusetts Special election results</title>
		<link>http://offensivepolitics.net/blog/?p=275</link>
		<comments>http://offensivepolitics.net/blog/?p=275#comments</comments>
		<pubDate>Wed, 27 Jan 2010 21:10:33 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=275</guid>
		<description><![CDATA[I had previously posted maps showing the difference in major party vote share between the 2008 Presidential election and the 2010 special Senate election in Massachusetts. Colleagues and readers of the Revolutions blog had some very insightful criticisms of these maps, in particular that the color scale was over-stating the swing in voter sentiment. I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>I had <a href="http://offensivepolitics.net/blog/?p=261">previously posted</a> maps showing the difference in major party vote share between the 2008 Presidential election and the 2010 special Senate election in Massachusetts. Colleagues and readers of the <a href="http://blog.revolution-computing.com/2010/01/mapping-the-massachusetts-election-upset-with-r.html">Revolutions blog</a> had some very insightful criticisms of these maps, in particular that the color scale was over-stating the swing in voter sentiment. I&#8217;ve decided to perform the process again, taking some of their advice into account, and hopefully producing more useful output in the process. </p>
<p>One concern was my using arbitrary-sized bins for the range comparison. Revo blog commenter Mike Lawrence suggested 3 equal sized bins with a neutral color for near zero values. Given the variance in election returns between townships (sd=28 points in 2010,sd=20 in 2008) I&#8217;m going to choose 10 bins instead of 3, but otherwise this is an excellent suggestion. Initially we&#8217;ll be measuring the difference between the Republican and Democratic vote percentage in each township for the 2008 and 2010 elections. This tells us not only which party won, but by how much. </p>
<p>Here is the code to recreate the MA 2010 senate returns map with breaks of 15% and a different color palette showing neutral middle colors.</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;">  library<span style="color: #66cc66;">&#40;</span>maptools<span style="color: #66cc66;">&#41;</span>
  library<span style="color: #66cc66;">&#40;</span>sp<span style="color: #66cc66;">&#41;</span>
  library<span style="color: #66cc66;">&#40;</span>Hmisc<span style="color: #66cc66;">&#41;</span>
&nbsp;
  labels <span style="color: #78aaac;">=</span> c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;54-70% Dem&quot;</span>,<span style="color: #ff0000;">&quot;41-55% Dem&quot;</span>,<span style="color: #ff0000;">&quot;26-40% Dem&quot;</span>,<span style="color: #ff0000;">&quot;11-25% Dem&quot;</span>,<span style="color: #ff0000;">&quot;0-10% Dem&quot;</span>,<span style="color: #ff0000;">&quot;0-10% Rep&quot;</span>,<span style="color: #ff0000;">&quot;10-24% Rep&quot;</span>,<span style="color: #ff0000;">&quot;25-39% Rep&quot;</span>,<span style="color: #ff0000;">&quot;40-54% Rep&quot;</span>,<span style="color: #ff0000;">&quot;55-70% Rep&quot;</span><span style="color: #66cc66;">&#41;</span>
  cuts<span style="color: #78aaac;">=</span>c<span style="color: #66cc66;">&#40;</span><span style="color: #78aaac;">-</span>.7,<span style="color: #78aaac;">-</span>.55,<span style="color: #78aaac;">-</span>.4,<span style="color: #78aaac;">-</span>.25,<span style="color: #78aaac;">-</span>.1,<span style="color: #cc66cc;">0</span>,.1,.25,.4,.55,.70<span style="color: #66cc66;">&#41;</span>
&nbsp;
  masen <span style="color: #78aaac;">&lt;-</span> read.csv<span style="color: #66cc66;">&#40;</span>'masen2010.csv'<span style="color: #66cc66;">&#41;</span>
  masen$Total <span style="color: #78aaac;">&lt;-</span> masen$GOP <span style="color: #78aaac;">+</span> masen$DEM <span style="color: #78aaac;">+</span> masen$LIB
  masen$Offset <span style="color: #78aaac;">&lt;-</span> <span style="color: #66cc66;">&#40;</span>masen$GOP <span style="color: #78aaac;">-</span> masen$DEM<span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">/</span> masen$Total
  shp <span style="color: #78aaac;">&lt;-</span> readShapeSpatial<span style="color: #66cc66;">&#40;</span>'tl_2009_25_cousub'<span style="color: #66cc66;">&#41;</span>
  shpA <span style="color: #78aaac;">&lt;-</span> shp<span style="color: #66cc66;">&#91;</span>match<span style="color: #66cc66;">&#40;</span>masen$place,shp$<span style="color:#228b22;">NA</span>ME<span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>
  masen$Col1 <span style="color: #78aaac;">&lt;-</span>cut2<span style="color: #66cc66;">&#40;</span>masen$Offset,cuts<span style="color: #78aaac;">=</span>cuts<span style="color: #66cc66;">&#41;</span>
  colors <span style="color: #78aaac;">&lt;-</span> rev<span style="color: #66cc66;">&#40;</span>brewer.pal<span style="color: #66cc66;">&#40;</span>'RdYlBu',n<span style="color: #78aaac;">=</span><span style="color: #cc66cc;">10</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>  
  plot<span style="color: #66cc66;">&#40;</span>shpA,col<span style="color: #78aaac;">=</span>colors<span style="color: #66cc66;">&#91;</span>as.integer<span style="color: #66cc66;">&#40;</span>masen$Col1<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span>,border<span style="color: #78aaac;">=</span>grey<span style="color: #66cc66;">&#40;</span>.85<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
  title<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;Major Party Winning Vote Percentage MA Sen 2010&quot;</span><span style="color: #66cc66;">&#41;</span>
  legend<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;bottomleft&quot;</span>, legend<span style="color: #78aaac;">=</span>labels,fill<span style="color: #78aaac;">=</span>colors<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>Which gives us the following output:<br />
<a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/a.jpg"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/a.jpg" alt="MA Winning vote percentage Special 2010" title="MA Winning vote percentage Special 2010" width="640" height="480" class="aligncenter size-full wp-image-288" /></a></p>
<p>I believe this map more accurately represents the results. As a companion to the 2010 major party results I&#8217;d like to recreate the 2008 presidential results, using the same scale and bins.</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;">gen2008 <span style="color: #78aaac;">&lt;-</span> read.csv<span style="color: #66cc66;">&#40;</span>'mapg2008.csv'<span style="color: #66cc66;">&#41;</span>
gen2008$Offset <span style="color: #78aaac;">&lt;-</span> <span style="color: #66cc66;">&#40;</span>gen2008$rep <span style="color: #78aaac;">-</span> gen2008$dem<span style="color: #66cc66;">&#41;</span><span style="color: #78aaac;">/</span>  gen2008$total
gen2008$Col1 <span style="color: #78aaac;">&lt;-</span>  cut2<span style="color: #66cc66;">&#40;</span>gen2008$Offset,cuts<span style="color: #78aaac;">=</span>cuts<span style="color: #66cc66;">&#41;</span>
shpA <span style="color: #78aaac;">&lt;-</span> shp<span style="color: #66cc66;">&#91;</span>match<span style="color: #66cc66;">&#40;</span>gen2008$place, shp$<span style="color:#228b22;">NA</span>ME<span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>
plot<span style="color: #66cc66;">&#40;</span>shpA,col<span style="color: #78aaac;">=</span>colors<span style="color: #66cc66;">&#91;</span>as.integer<span style="color: #66cc66;">&#40;</span>gen2008$Col1<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span>,border<span style="color: #78aaac;">=</span>grey<span style="color: #66cc66;">&#40;</span>.85<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
legend<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;bottomleft&quot;</span>,legend<span style="color: #78aaac;">=</span>labels,fill<span style="color: #78aaac;">=</span>colors<span style="color: #66cc66;">&#41;</span>
title<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;Major party win percentage MA Presidential 2008&quot;</span><span style="color: #66cc66;">&#41;</span></pre></div></div>

<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/b.jpg"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/b.jpg" alt="bMajor party win percentage MA Presidential 2008" title="bMajor party win percentage MA Presidential 2008" width="640" height="480" class="aligncenter size-full wp-image-289" /></a></p>
<p>Now the two graphs can more or less be compared apples to apples, and we can draw some very interesting conclusions from that comparison. The nine townships that were heavily democratic in 2008 actually stayed heavily Democratic in 2010, white most all other townships showed an increase in Republican support. These townships didn&#8217;t necessarily flip from being won by a Democrat to being won by a Republican, but every township showed an increase. </p>
<p>The least successful map I had previously produced was attempting to illustrate the degree of increased Republican support. What I really wanted to show was the increase in Republican support between the elections 2008 and 2010 elections.</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;">masen <span style="color: #78aaac;">&lt;-</span> masen<span style="color: #66cc66;">&#91;</span>order<span style="color: #66cc66;">&#40;</span>masen$place<span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>
magen <span style="color: #78aaac;">&lt;-</span> gen2008<span style="color: #66cc66;">&#91;</span>order<span style="color: #66cc66;">&#40;</span>gen2008$place<span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>
masen$RepP <span style="color: #78aaac;">&lt;-</span> <span style="color: #66cc66;">&#40;</span>masen$GOP <span style="color: #78aaac;">/</span> masen$Total<span style="color: #66cc66;">&#41;</span><span style="color: #78aaac;">*</span><span style="color: #cc66cc;">100</span>
magen$RepP <span style="color: #78aaac;">&lt;-</span> <span style="color: #66cc66;">&#40;</span>magen$rep <span style="color: #78aaac;">/</span> magen$total<span style="color: #66cc66;">&#41;</span><span style="color: #78aaac;">*</span><span style="color: #cc66cc;">100</span>
gain <span style="color: #78aaac;">&lt;-</span> masen$RepP <span style="color: #78aaac;">-</span> magen$RepP
gr <span style="color: #78aaac;">&lt;-</span> cut2<span style="color: #66cc66;">&#40;</span>gain,cuts<span style="color: #78aaac;">=</span>seq<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0</span>,<span style="color: #cc66cc;">30</span>,by<span style="color: #78aaac;">=</span><span style="color: #cc66cc;">5</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
colors <span style="color: #78aaac;">&lt;-</span> brewer.pal<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;Reds&quot;</span>,n<span style="color: #78aaac;">=</span><span style="color: #cc66cc;">6</span><span style="color: #66cc66;">&#41;</span>
shpA <span style="color: #78aaac;">&lt;-</span> shp<span style="color: #66cc66;">&#91;</span>match<span style="color: #66cc66;">&#40;</span>magen$place,shp$<span style="color:#228b22;">NA</span>ME<span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>
plot<span style="color: #66cc66;">&#40;</span>shpA,col<span style="color: #78aaac;">=</span>colors<span style="color: #66cc66;">&#91;</span>as.integer<span style="color: #66cc66;">&#40;</span>gr<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>
title<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;Increase in Republican vote percentage 2008 to 2010&quot;</span><span style="color: #66cc66;">&#41;</span>
legend<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;bottomleft&quot;</span>,legend<span style="color: #78aaac;">=</span>c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;0-4%&quot;</span>,<span style="color: #ff0000;">&quot;5-9%&quot;</span>,<span style="color: #ff0000;">&quot;10-14%&quot;</span>,<span style="color: #ff0000;">&quot;15-19%&quot;</span>,<span style="color: #ff0000;">&quot;20-24%&quot;</span>,<span style="color: #ff0000;">&quot;25-30%&quot;</span><span style="color: #66cc66;">&#41;</span>,fill<span style="color: #78aaac;">=</span>colors<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/repinc0810.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/repinc0810.png" alt="Republican increase in vote percentage 2008 to 2010" title="Republican increase in vote percentage 2008 to 2010" width="640" height="480" class="aligncenter size-full wp-image-292" /></a></p>
<p>From this map we can see that from 2008 to 2010 the Republican vote percentage actually increased in every single township in MA. This is kind of shocking, especially when you consider that Democrats enjoy a 15 point registration advantage statewide(<a href="http://www.sec.state.ma.us/ele/elepdf/st_county_town_enroll_breakdown_08.pdf">source: warning PDF</a>).  If I were a campaign manager for Richard Neal (2nd Cd) or Jim McGovern (3rd CD) I&#8217;d take a long look at what the Coakley campaign did to so many democratic voters to either stay home or cross party lines in the 2010 special election. </p>
<p>jjh</p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/?feed=rss2&amp;p=275</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mapping MA election results</title>
		<link>http://offensivepolitics.net/blog/?p=261</link>
		<comments>http://offensivepolitics.net/blog/?p=261#comments</comments>
		<pubDate>Mon, 25 Jan 2010 17:07:49 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[MASEN]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=261</guid>
		<description><![CDATA[The Swing State Project recently had some very interesting maps comparing last week&#8217;s election results from Massachusetts to 2008 presidential primary results. Their maps posted show some very interesting trends, but the maps themselves are lacking in information and the color schemes are pretty ugly. Using my own source data I recreated their election night [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.swingstateproject.com/">Swing State Project</a> recently had some <a href="http://www.swingstateproject.com/diary/6241/masen-map-of-special-election-results-by-town">very interesting maps</a> comparing last week&#8217;s election results from Massachusetts to 2008 presidential primary results. Their maps posted show some very interesting trends, but the maps themselves are lacking in information and the color schemes are pretty ugly. Using my own source data I recreated their election night maps, along with a few more. The geography was taken directly from the US Census so some of the waterlines are pretty strange compared to other result maps like the Boston Globe and Swing State Project.</p>
<p>First look at the results from the MA 2010 Senate special election. </p>
<div id="attachment_265" class="wp-caption aligncenter" style="width: 630px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/masen2010.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/masen2010.png" alt="MA Senate 2010 Results" title="MA Senate 2010 Results" width="620" height="400" class="size-full wp-image-265" /></a><p class="wp-caption-text">MA Senate 2010 Results</p></div>
<p>Now compare that to the results of the 2008 presidential primary in MA, blue gradients for Obama and red gradients for Clinton. </p>
<div id="attachment_263" class="wp-caption aligncenter" style="width: 630px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/madempp2008.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/madempp2008.png" alt="Democrat 2008 Presidential Primary results" title="Democrat 2008 Presidential Primary results" width="620" height="400" class="size-full wp-image-263" /></a><p class="wp-caption-text">Democrat 2008 Presidential Primary results</p></div>
<p>Using the same color scale and data cut-points we can see the same results that DavidNYC came up with, namely that winning townships for Clinton in 2008 tracked decently to winning townships for Scott Brown in the 2010 Senate race. </p>
<p>Obama lost the MA presidential primary in 2008, but he won in the general. Here is a map showing the vote margins from the 2008 presidential general: </p>
<div id="attachment_264" class="wp-caption aligncenter" style="width: 630px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/magen2008.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/magen2008.png" alt="MA 2008 Presidential results " title="MA 2008 Presidential results " width="620" height="400" class="size-full wp-image-264" /></a><p class="wp-caption-text">MA 2008 Presidential results </p></div>
<p>Compared to the MA 2010 Senate race last week these 2008 numbers are astounding. The democratic vote margin in 2008 was 26 points (62% to 36%), while the 2010 senate race was -5 points (47-52), with 30% less turnout in 2010 than in 2008. I decided to create another map showing the decline in democratic vote support </p>
<div id="attachment_262" class="wp-caption aligncenter" style="width: 630px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/madem2008-2010diff.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/01/madem2008-2010diff.png" alt="Democratic vote change 2010 to 2008" title="Democratic vote change 2010 to 2008" width="620" height="400" class="size-full wp-image-262" /></a><p class="wp-caption-text">Democratic vote change 2010 to 2008</p></div>
<p>Does this mean voters have turned against democrats in MA? Maybe, but it is interesting that Scott Brown got more votes in the special election (1,168,107) as McCain did in the 2008 general (1,108,854), even though 900,000 less people voted in the special. So Brown was able to do two things: 1) Activate republican McCain voters, 2) Cause democrats to cross party lines. I don&#8217;t think this spells certain doom for congressional democrats in the midterms, but it does show that Democrats will stay home on election day, even in a historically democratic state, for the wrong candidate. </p>
<p>You can create the maps yourself or play with the data by downloading the source + data files <a href='http://offensivepolitics.net/blog/wp-content/uploads/2010/01/mamaps.zip'>here</a>. With a recent installation of R you can recreate the maps above by running the the following command:</p>
<pre language="BASH">
R CMD BATCH draw.R
</pre>
<p>Thanks.</p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/?feed=rss2&amp;p=261</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Aggreate electoral targeting with R</title>
		<link>http://offensivepolitics.net/blog/?p=113</link>
		<comments>http://offensivepolitics.net/blog/?p=113#comments</comments>
		<pubDate>Thu, 22 Oct 2009 15:08:49 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[targeting]]></category>
		<category><![CDATA[Virginia House of Delegates]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=113</guid>
		<description><![CDATA[Aggregate electoral targeting is the process of determining the likely hood of a citizen choosing to vote turnout), and which candidate that person is most likely to vote for (partisan bias) in a given race using historical turnout and partisan bias. The output from this targeting allows a campaign to project the likely number of voters, what percentages their candidate is likely to receive, and even the general geographic locations of supporters.]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>Electoral targeting is the process of quantifying the partisan bias of a single voter or subset of voters in a geographic region. Bias can be calculated using an individual&#8217;s demographic and voting behavior or by aggregating results from an entire election precinct. Targeting is traditionally performed by national committees (e.g., <a href="http://www.ncec.org/">National Committee for an Effective Congress</a>, <a href="http://www.nrcc.org/">National Republican Congressional Committee</a>), state political parties, interest groups (e.g., <a href="http://www.emilyslist.org/">EMILY&#8217;s List</a>, <a href="http://home.nra.org/">National Rifle Association</a>), or campaign consultants. Targeting data is consumed by campaign managers and analysts, and it is used along with polling data to build strategy, direct resources, and project electoral outcomes. </p>
<p>While aggregate electoral targeting can build a sophisticated picture of a district, the mathematics behind targeting are very simple. Targeting can be performed by anyone with previous electoral data, and calculations can be done using 3&#215;5 note cards, with simple spreadsheets, or high-end software packages like SPSS. The targeting methods discussed in this post are taken from academic publications on electioneering: <a target="_blank" href="http://www.amazon.com/Campaign-Craft-Praeger-Political-Communication/dp/0275990044/">Campaign Craft</a> (Burton, Shea 2006) and <a target="_blank" href="http://www.amazon.com/Campaign-Manager-Running-Winning-Elections/dp/0813344514/">The Campaign Manager</a> (Shaw, 2004). </p>
<p>Although targeting data is usually usually inexpensive or free, a down-ballot campaign or a primary challenger might not have the connections or support of a PAC or party to obtain the data. In these cases, a campaign will probably purchase one of the books listed above to perform its own analysis. Even an established campaign may run its own analysis, possibly to test different turnout theories or to integrate additional data. This post is directed towards these groups.</p>
<p>Together, we will assume the role of campaign consultant and perform an aggregate electoral analysis on the 13th House of Delegates seat (HOD#13) in the Commonwealth of Virginia. In HOD#13, the 18-year Republican incumbent <a href="http://delegatebob.com/" target="_blank">Bob Marshall</a> is being challenged by Democrat <a target="_blank" href="http://www.johnbell2009.com/">John Bell</a>. This analysis will compute and visualize turnout, partisan bias, and a precinct ranking based on projected turnout and historical Democratic support. </p>
<p>The analysis of HOD#13 will be performed using R, an open-source computing platform. R is free, extensible, and interactive, making it an ideal platform for experimentation. The R package <b>aggpol</b> was created specifically for this tutorial, and it contains all the data and operations required to execute an aggregate electoral analysis. Readers can execute the provided R code to reproduce the analysis or simply follow along to learn how it was performed. Readers unfamiliar with R should read <a target="_blank" href="http://cran.r-project.org/doc/manuals/R-intro.html">Introduction to R</a>, which is available on the <a target="_blank" href="http://www.r-project.org/">R project homepage</a>.</p>
<p>The electoral and registration data used were compiled from the <a target="_blank" href="http://www.sbe.virginia.gov/cms/Election_Information/Election_Results/Index.html">Virginia State Board of Elections</a> using several custom written parsers and two different PDF-to-text engines. Please contact me for source data or more information at: <a href="mailto:jjh@offensivepolitics.net">jjh@offensivepolitics.net</a>. </p>
<h2>Prerequisites</h2>
<p>This section only applies to readers interested in recreating the analysis and graphics produced in this tutorial. To completely recreate this analysis you will need the following: </p>
<ol>
<li>The latest version of the R statistical computing environment. Binaries, source, and installation instructions can be downloaded from <a href="http://www.r-project.org/">R homepage</a>. </li>
<li>Additional R packages. This analysis requires several packages that provide additional functionality on top of the existing R system. Install the appropriate R environment for your system and run the program.
<ul> <b>plyr</b>, <b>ggplot2</b>, <b>RColorBrewer</b>. To install these packages execute the following in your R environment:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">install.packages<span style="color: #66cc66;">&#40;</span>c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;plyr&quot;</span>, <span style="color: #ff0000;">&quot;RColorBrewer&quot;</span>, <span style="color: #ff0000;">&quot;ggplot2&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

</ul>
<ul>Next you&#8217;ll need to install the <b>aggpol</b> package for calculating aggregate political statistics. You will need to download the latest version: <a href="http://offensivepolitics.net/R/aggpol-latest.tar.gz">For Unix-style</a> systems or <a href="http://offensivepolitics.net/R/aggpol-latest.zip">For Windows</a> systems. Installation of local packages is detailed in the <a href="http://cran.r-project.org/doc/manuals/R-admin.html#Installing-packages">R Manual</a> on package installation.
	</li>
</ol>
<h2>Getting Started</h2>
<p>Now that the prerequisites are installed we can get started with our data analysis. Start up your R environment and load the required libraries by typing in the following commands:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">library<span style="color: #66cc66;">&#40;</span>plyr<span style="color: #66cc66;">&#41;</span>
library<span style="color: #66cc66;">&#40;</span>aggpol<span style="color: #66cc66;">&#41;</span>
library<span style="color: #66cc66;">&#40;</span>ggplot2<span style="color: #66cc66;">&#41;</span>
library<span style="color: #66cc66;">&#40;</span>RColorBrewer<span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>We need to attach the VAHOD data set that comes with <b>aggpol</b>. This data set contains precinct-level electoral returns for state and federal elections in the Commonwealth of Virginia from 2001 to 2008. Since we are focusing on HOD#13, we&#8217;ll need to select just the records that have to do with that seat.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">data<span style="color: #66cc66;">&#40;</span>VAHOD<span style="color: #66cc66;">&#41;</span>
hd013 <span style="color: #78aaac;">&lt;-</span> vahod<span style="color: #66cc66;">&#91;</span>which<span style="color: #66cc66;">&#40;</span>vahod$seat <span style="color: #78aaac;">==</span> <span style="color: #ff0000;">&quot;HD-013&quot;</span><span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span></pre></td></tr></table></div>

<p>The data set contains precinct-level electoral results for the following races: U.S. President, U.S.Senate, U.S.House of Representatives, Virginia Governor, Senate of Virginia, and Virginia House of Delegates. This breadth of electoral returns allows us to build a very detailed profile of the partisan bias of a district. </p>
<p>We will first determine the historical partisanship in HOD#13. Since partisanship can fluctuate over the years and different seats have different turnout expectations, we&#8217;ll first need to see the major party support for every seat in each election for precincts in HOD#13. We can use the <b>historical.election.summary</b> function from the <b>aggpol</b> package to group the precinct results into district results, and then break them down by seat and year.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">esum <span style="color: #78aaac;">&lt;-</span> historical.election.summary<span style="color: #66cc66;">&#40;</span>hd013<span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p><b><i>esum</i></b> now contains: </p>
<table>
<tr>
<th>.</th>
<th>year</th>
<th>district_type</th>
<th>total.turnout</th>
<th>rep.turnout</th>
<th>rep.turnout.percent</th>
<th>dem.turnout</th>
<th>dem.turnout.percent</th>
<th>oth.turnout</th>
<th>oth.turnout.percent</th>
</tr>
<tr>
<td>1</td>
<td>2001</td>
<td>GV</td>
<td>5527</td>
<td>3266</td>
<td>0.5909</td>
<td>2207</td>
<td>0.3993</td>
<td>54</td>
<td>0.0097</td>
</tr>
<tr>
<td>2</td>
<td>2001</td>
<td>HD</td>
<td>5399</td>
<td>3475</td>
<td>0.6436</td>
<td>1924</td>
<td>0.3563</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>3</td>
<td>2001</td>
<td>LG</td>
<td>5432</td>
<td>3291</td>
<td>0.6058</td>
<td>2025</td>
<td>0.3727</td>
<td>116</td>
<td>0.0213</td>
</tr>
<tr>
<td>4</td>
<td>2003</td>
<td>HD</td>
<td>10299</td>
<td>10103</td>
<td>0.9809</td>
<td>110</td>
<td>0.0106</td>
<td>86</td>
<td>0.0083</td>
</tr>
<tr>
<td colspan="6">13 more lines&#8230;</tr>
</table>
<p>We now have major-party turnout for every election in our data set. To best visualize the results we&#8217;ll build a bar graph comparing major-party turnout in each seat over time. We first need to transpose the election summary object (<b><i>esum</i></b>) from a summary format to an observation format, one line per distinct year+district+party. The <a  target="_blank" href="http://had.co.nz/plyr/">plyr</a> package makes this task extremely simple.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">elx <span style="color: #78aaac;">&lt;-</span> ddply<span style="color: #66cc66;">&#40;</span>esum,c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;year&quot;</span>, <span style="color: #ff0000;">&quot;district_type&quot;</span><span style="color: #66cc66;">&#41;</span>, <span style="color: #a020f0;">function</span><span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> 
  rbind<span style="color: #66cc66;">&#40;</span>  data.frame<span style="color: #66cc66;">&#40;</span>party<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;REP&quot;</span>,turnout<span style="color: #78aaac;">=</span>x$rep.turnout.percent<span style="color: #66cc66;">&#41;</span>,
  data.frame<span style="color: #66cc66;">&#40;</span>party<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;DEM&quot;</span>,turnout<span style="color: #78aaac;">=</span>x$dem.turnout.percent<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>We will now use the powerful <a href="http://had.co.nz/ggplot2/" target="_blank">ggplot2</a> package to view the Republican and Democratic support for each election, in each seat, for our subset:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">ggplot<span style="color: #66cc66;">&#40;</span>elx,aes<span style="color: #66cc66;">&#40;</span>year,turnout,fill<span style="color: #78aaac;">=</span>factor<span style="color: #66cc66;">&#40;</span>party<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  geom_bar<span style="color: #66cc66;">&#40;</span>stat<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;identity&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  facet_wrap<span style="color: #66cc66;">&#40;</span>~district_type,scales<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;free_x&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  scale_fill_brewer<span style="color: #66cc66;">&#40;</span>palette<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Set1&quot;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>Result:<br />
<img src="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/turnout-by-year-by-seat.png" alt="HD#013 major party percentages" border=1/></p>
<p>This graphic gives us a decent understanding of district-level electoral trends. For U.S. federal elections (figs.: PVP, USH, USS), we can see a distinct drop in Republican support moving towards 2008; the results for U.S. House (USH) and U.S. Senate (USS), in particular, show a strong increase in Democratic support. This growth correlates to statewide trends that resulted in the election of two Democratic Senators representing Virginia for the first time since 1970. General Democratic gains notwithstanding, the House of Delegates (fig.: HD) results aren&#8217;t as promising for a Democratic challenger. The incumbent Del. Marshall saw more than 60% support in three of the last four elections and saw no challenger at all in 2003. While the district may be trending more Democratic over time, the voters of HOD#13 are obviously big fans of Del. Marshall. </p>
<p>Now that we understand the historical partisanship of this district we need to understand historical turnout, allowing us to project of the number of votes required to win. We will utilize the <b>historical.turnout.summary</b> function from the <b>aggpol</b> package to produce a summary of turnout for this district.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">historical.turnout.summary<span style="color: #66cc66;">&#40;</span>hd013, district.type<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;HD&quot;</span>, district.number<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;013&quot;</span>, years<span style="color: #78aaac;">=</span>c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2001</span>,<span style="color: #cc66cc;">2003</span>,<span style="color: #cc66cc;">2005</span>,<span style="color: #cc66cc;">2007</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<table>
<tr>
<th>.</th>
<th>year</th>
<th>total.turnout</th>
<th>total.registration</th>
</tr>
<tr>
<td>1</td>
<td>2001</td>
<td>5399</td>
<td>13275</td>
</tr>
<tr>
<td>2</td>
<td>2003</td>
<td>10031</td>
<td>45769</td>
</tr>
<tr>
<td>3</td>
<td>2005</td>
<td>23592</td>
<td>62497</td>
</tr>
<tr>
<td>4</td>
<td>2007</td>
<td>26110</td>
<td>78028</td>
</tr>
</table>
<p>Looking at this table one can see some data collection problems in the 2001 HD elections. In recent years, precincts belonged to only one House of Delegates seat, but in 2001 and somewhat less so in 2003 some precincts are split and some have duplicate names and now information on how to allocate results from different races to precincts. The turnout numbers are slightly affected by these problems, but the <b>aggpol</b> attempts to correct this by substituting alternate years or even races if possible. </p>
<p>The take away from from the previous table is that turnout for the last four House of Delegates elections has hovered around 30%. This makes some political sense,  because Virginia holds state elections in odd-numbered years with no federal elections to drive up turnout. This leaves a lot of registered voters to be activated, but we need to delve down to the precinct level to find them.</p>
<p>We use the <b>district.analyze</b> function of <b>aggpol</b> to aggregate all electoral results into a summary for each precinct.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">hd013s <span style="color: #78aaac;">&lt;-</span> district.analyze<span style="color: #66cc66;">&#40;</span>hd013<span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p><b><i>hd013s</b></i> is a data frame with columns calculated for every precinct; several values for each major party and other values for the precinct as a whole. Those statistics are:</p>
<ul>
<li><b>Aggregate base partisan vote</b> &#8211; The lowest non-zero turnout for a major party, in all electoral years. </li>
<li><b>Average Party Performance</b> &#8211; The average percentage of the vote a party receives in the closest 3 elections in recent years.</li>
<li><b>Swing vote</b> &#8211; The part of the electorate not included in the aggregate base partisan vote.</li>
<li><b>Soft-partisan vote</b> &#8211; The average worst a party has performed, minus the actual worst.</li>
<li><b>Toss-up</b> &#8211; The portion of the electorate not included in the Aggregate base or soft-base partisan vote.</li>
<li><b>Partisan base</b> &#8211; The combined aggregate-base and soft-partisan vote for each major party.</li>
<li><b>Partisan swing</b> &#8211; The combined major party swing vote.</li>
<li><b>Projected turnout</b> &#8211; The portion of the electorate that is projected to turn out given previous turnout and current registration data.</li>
</ul>
<p>These variables can be visualized with the following graphic, adapted &#8211;along with definitions above&#8211; from Campaign Craft (Burton, Shea). </p>
<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/variables.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/variables.png" alt="variables" title="variables" width="702" height="377" class="aligncenter size-full wp-image-161" /></a></p>
<p>The actual columns in the data frame returned from <b>from district.analyze</b> are:</p>
<ul>
<li><b>proj.turnout.percent</b> &#8211; The projected turnout percent of for a hypothetical next election. </li>
<li><b>proj.turnout.count</b> &#8211;  The projected number of voters who will turn out for a hypothetical next election. </li>
<li><b>current.reg</b> &#8211; Current number of registered voters in a precinct. </li>
<li><b>partisan.base</b> &#8211; The combined aggregate-base and soft-partisan vote for both major parties ( Partisan base ). </li>
<li><b>partisan.swing</b> &#8211; All non-base voters (1.0 &#8211; partisan.base).</li>
<li><b>tossup</b> &#8211; The portion of the electorate not in the base or soft support of either major party.</li>
<li><b>app.rep</b> &#8211; The average party performance of a Republican candidate in this precinct.</li>
<li><b>base.rep</b> &#8211; The aggregate base partisan vote for a Republican candidate in this precinct.</li>
<li><b>soft.rep</b> &#8211; The soft partisan vote for a Republican candidate in this precinct.</li>
<li><b>app.dem</b> &#8211; The average party performance of a Democratic candidate in this precinct.</li>
<li><b>base.dem</b> &#8211; The aggregate base partisan vote for a Democratic candidate in this precinct.</li>
<li><b>soft.dem</b> &#8211; The soft partisan vote for a Democratic candidate in this precinct.</li>
<li><b>partisan.rep</b> &#8211; Combination of aggregate base and soft vote percentages for the Republican.</li>
<li><b>partisan.dem</b> &#8211; Combination of aggregate base and soft vote percentages for the Democrat.</li>
</ul>
<p>The most useful statistic above is the Average Party Performance (APP), which is an average of major-party turnout in the 3 closest recent elections. The APP describes supporter levels for a best-case scenario in a close election. We&#8217;ve already calculated the APP of each major party (<b><i>app.dem</b></i>, <b><i>app.rep</b></i>), but when a race doesn&#8217;t have a third party candidate what we&#8217;ll usually visualize is the share of the combined partisan performance that each party receives. We&#8217;ll add these variables to our summary data frame generated previously, one for each major party.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">hd013s$dem.share <span style="color: #78aaac;">&lt;-</span> hd013s$app.dem<span style="color: #78aaac;">/</span><span style="color: #66cc66;">&#40;</span>hd013s$app.dem<span style="color: #78aaac;">+</span>hd013s$app.rep<span style="color: #66cc66;">&#41;</span>
hd013s$rep.share <span style="color: #78aaac;">&lt;-</span> hd013s$app.rep<span style="color: #78aaac;">/</span><span style="color: #66cc66;">&#40;</span>hd013s$app.dem<span style="color: #78aaac;">+</span>hd013s$app.rep<span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>Now that we have the APP and partisan vote share for each party, we can visualize the precinct-level terrain for the Democratic challenger Mr. Bell. This visualization should show us the democratic support for each precinct and give us an idea whinc precincts could be competitive. We&#8217;ll produce this visualization using a density plot + 1d histogram, adapted from the seatsVotes plot in the <a target="_blank" href="http://cran.r-project.org/web/packages/pscl/index.html">pscl</a> package. We&#8217;ll also draw a cut-line down the 50% vote mark to to help find competitive precincts.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">qplot<span style="color: #66cc66;">&#40;</span>dem.share, data<span style="color: #78aaac;">=</span>hd013s, geom<span style="color: #78aaac;">=</span>c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;density&quot;</span>,<span style="color: #ff0000;">&quot;rug&quot;</span><span style="color: #66cc66;">&#41;</span>,
    xlab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Dem Vote Share&quot;</span>,
    main<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Democratic vote share, by precinct&quot;</span><span style="color: #66cc66;">&#41;</span>  <span style="color: #78aaac;">+</span> 
  geom_vline<span style="color: #66cc66;">&#40;</span>xintercept<span style="color: #78aaac;">=</span>.50<span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct.png" alt="dem-vote-share-by-precinct" title="dem-vote-share-by-precinct" width="671" height="670" class="aligncenter size-full wp-image-164" /></a></p>
<p>We can see a lot of precincts are between 48% and 53% Democratic, which means those precincts could potentially go for either candidate. We need to classify these results into something more solid. Let&#8217;s say precincts with less than 48% Democratic share are Safe Republican, 48-52% are Tossup, and greater than 52% are Safe Democrat. This is a simple representation but can be refined later. We&#8217;ll add a seat classification to our data frame using  the <b>cut</b> function:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">hd013s$cl <span style="color: #78aaac;">&lt;-</span> cut<span style="color: #66cc66;">&#40;</span>hd013s$dem.share, breaks<span style="color: #78aaac;">=</span>c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">0</span>,.48,.52,<span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>, labels<span style="color: #78aaac;">=</span>c<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;Safe Rep&quot;</span>, <span style="color: #ff0000;">&quot;Tossup&quot;</span>, <span style="color: #ff0000;">&quot;Safe Dem&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p>Now we need to visualize how many precincts fall into which classification, using a histogram this time instead of a density curve.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">ggplot<span style="color: #66cc66;">&#40;</span>hd013s, aes<span style="color: #66cc66;">&#40;</span>x<span style="color: #78aaac;">=</span>dem.share<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  geom_bar<span style="color: #66cc66;">&#40;</span>aes<span style="color: #66cc66;">&#40;</span>fill<span style="color: #78aaac;">=</span>cl<span style="color: #66cc66;">&#41;</span>,binwidth<span style="color: #78aaac;">=</span><span style="color: #cc66cc;">0.01</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  scale_fill_brewer<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;Precinct Rating&quot;</span>, palette<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;RdYlBu&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  scale_x_continuous<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;Democratic Vote Share&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  scale_y_continuous<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;Frequency&quot;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct-hist.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct-hist.png" alt="dem-vote-share-by-precinct-hist" title="dem-vote-share-by-precinct-hist" width="671" height="670" class="aligncenter size-full wp-image-165" /></a></p>
<p>From the histogram we see that not only does a Republican candidate enjoy more &#8220;Safe&#8221; precincts, but even the majority of the tossup precincts have less than 50% Democratic share. While the precinct breakdown looks bad, a Democratic win in this district is theoretically possible if these tossup precincts are held. A Democratic candidate will face a tough challenge, so the next step will be identifying Democratic and Democrat-leaning precincts to target. </p>
<p>To make this target precinct list we&#8217;ll need a method to prioritize the precincts so that we can reach the most persuadable voters while spending the least resources. A popular method to  identify a precinct as high-value is to sort precincts by lowest projected turnout with highest Democratic vote share. Lower turnout means there are registered voters waiting to be convinced to show up, and high Democratic vote share means more of those voters will be Democrats. </p>
<p>Since we measured both of these values (turnout%, democratic vote share), it is very easy to order our data by turnout (ascending) and democratic average party performance (descending) using R.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">hd013s<span style="color: #66cc66;">&#91;</span>order<span style="color: #66cc66;">&#40;</span>hd013s$proj.turnout.percent,<span style="color: #78aaac;">-</span>hd013s$app.dem<span style="color: #66cc66;">&#41;</span>,c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>:<span style="color: #cc66cc;">2</span>,<span style="color: #cc66cc;">20</span>:<span style="color: #cc66cc;">21</span><span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span></pre></td></tr></table></div>

<table>
<tr>
<th></th>
<th>precinct_name</th>
<th>proj.turnout.percent</th>
<th>dem.share</th>
<th>rep.share</th>
</tr>
<tr>
<td>25</td>
<td>153 &#8211; 409 &#8211; SUDLEY NORTH</td>
<td>0.1959</td>
<td>0.5105</td>
<td>0.4894</td>
</tr>
<tr>
<td>27</td>
<td>153 &#8211; 411 &#8211; MULLEN</td>
<td>0.2218</td>
<td>0.5026</td>
<td>0.4973</td>
</tr>
<tr>
<td>4</td>
<td>107 &#8211; 111 &#8211; BRIAR WOODS</td>
<td>0.2256</td>
<td>0.4837</td>
<td>0.5162</td>
</tr>
<tr>
<td>6</td>
<td>107 &#8211; 212 &#8211; CLAUDE MOORE PARK</td>
<td>0.2279</td>
<td>0.5285</td>
<td>0.4714</td>
</tr>
<tr>
<td>26</td>
<td>153 &#8211; 410 &#8211; MOUNTAIN VIEW</td>
<td>0.2319</td>
<td>0.4945</td>
<td>0.5054</td>
</tr>
<tr>
<td>16</td>
<td>153 &#8211; 110 &#8211; BUCKLAND MILLS</td>
<td>0.2448</td>
<td>0.4891</td>
<td>0.5108</td>
</tr>
<tr>
<td>13</td>
<td>153 &#8211; 106 &#8211; ELLIS</td>
<td>0.2475</td>
<td>0.5038</td>
<td>0.4961</td>
</tr>
<tr>
<td>5</td>
<td>107 &#8211; 112 &#8211; FREEDOM</td>
<td>0.2509</td>
<td>0.5028</td>
<td>0.4971</td>
</tr>
<tr>
<td>15</td>
<td>153 &#8211; 108 &#8211; VICTORY</td>
<td>0.2645</td>
<td>0.5005</td>
<td>0.4994</td>
</tr>
<tr>
<td>24</td>
<td>153 &#8211; 408 &#8211; GLENKIRK</td>
<td>0.2837</td>
<td>0.4998</td>
<td>0.5001</td>
</tr>
<tr>
<td>1</td>
<td>107 &#8211; 106 &#8211; EAGLE RIDGE</td>
<td>0.2856</td>
<td>0.4992</td>
<td>0.5007</td>
</tr>
<tr>
<td>18</td>
<td>153 &#8211; 112 &#8211; CEDAR POINT</td>
<td>0.2876</td>
<td>0.4855</td>
<td>0.5144</td>
</tr>
<tr>
<td>14</td>
<td>153 &#8211; 107 &#8211; MARSTELLER</td>
<td>0.3067</td>
<td>0.4775</td>
<td>0.5224</td>
</tr>
<tr>
<td>3</td>
<td>107 &#8211; 109 &#8211; HUTCHISON</td>
<td>0.3168</td>
<td>0.4857</td>
<td>0.5142</td>
</tr>
<tr>
<td>2</td>
<td>107 &#8211; 108 &#8211; MERCER</td>
<td>0.3281</td>
<td>0.5034</td>
<td>0.4965</td>
</tr>
<tr>
<td>17</td>
<td>153 &#8211; 111 &#8211; BRISTOW RUN</td>
<td>0.3324</td>
<td>0.4822</td>
<td>0.5177</td>
</tr>
<tr>
<td>23</td>
<td>153 &#8211; 406 &#8211; ALVEY</td>
<td>0.3460</td>
<td>0.4736</td>
<td>0.5263</td>
</tr>
<tr>
<td>21</td>
<td>153 &#8211; 402 &#8211; BATTLEFIELD</td>
<td>0.3546</td>
<td>0.4323</td>
<td>0.5676</td>
</tr>
<tr>
<td>10</td>
<td>153 &#8211; 102 &#8211; BENNETT</td>
<td>0.3896</td>
<td>0.4959</td>
<td>0.5040</td>
</tr>
<tr>
<td>19</td>
<td>153 &#8211; 209 &#8211; WOODBINE</td>
<td>0.4014</td>
<td>0.4651</td>
<td>0.5348</td>
</tr>
<tr>
<td>7</td>
<td>107 &#8211; 307 &#8211; MIDDLEBURG</td>
<td>0.4043</td>
<td>0.4953</td>
<td>0.5046</td>
</tr>
<tr>
<td>9</td>
<td>153 &#8211; 101 &#8211; BRENTSVILLE</td>
<td>0.4180</td>
<td>0.4904</td>
<td>0.5095</td>
</tr>
<tr>
<td>22</td>
<td>153 &#8211; 403 &#8211; BULL RUN</td>
<td>0.4226</td>
<td>0.4860</td>
<td>0.5139</td>
</tr>
<tr>
<td>20</td>
<td>153 &#8211; 401 &#8211; EVERGREEN</td>
<td>0.4283</td>
<td>0.5006</td>
<td>0.4993</td>
</tr>
<tr>
<td>12</td>
<td>153 &#8211; 104 &#8211; NOKESVILLE</td>
<td>0.4537</td>
<td>0.4960</td>
<td>0.5039</td>
</tr>
<tr>
<td>11</td>
<td>153 &#8211; 103 &#8211; BUCKHALL</td>
<td>0.4636</td>
<td>0.4773</td>
<td>0.5226</td>
</tr>
<tr>
<td>8</td>
<td>107 &#8211; 309 &#8211; ALDIE</td>
<td>0.4687</td>
<td>0.4881</td>
<td>0.5118</td>
</tr>
</table>
<p>This sorted list is our critical intelligence to finding persuadable voters, but we need a better way to visualize the output. Since we have two scalar variables (turnout %, democratic vote share) we can use a scatter plot with the Democratic vote share on the Y axis and Turnout % on the X. We&#8217;ll also color each precinct with its seat classification we defined earlier (Safe Republican, Tossup, Safe Democrat):</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">ggplot<span style="color: #66cc66;">&#40;</span>aes<span style="color: #66cc66;">&#40;</span>x<span style="color: #78aaac;">=</span>proj.turnout.percent, y<span style="color: #78aaac;">=</span>dem.share<span style="color: #66cc66;">&#41;</span>, data<span style="color: #78aaac;">=</span>hd013s<span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  geom_point<span style="color: #66cc66;">&#40;</span>aes<span style="color: #66cc66;">&#40;</span>colour<span style="color: #78aaac;">=</span>cl,title<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;a&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  labs<span style="color: #66cc66;">&#40;</span>x<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Projected Turnout %&quot;</span>, y<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Democratic Vote Share %&quot;</span>,colour<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Seat Type&quot;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct-scatter-color.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct-scatter-color.png" alt="dem-vote-share-by-precinct-scatter-color" title="dem-vote-share-by-precinct-scatter-color" width="671" height="670" class="aligncenter size-full wp-image-166" /></a></p>
<p>This chart echoes what we&#8217;ve seen previously: the Democratic challenger faces an uphill battle, but there is room for a win. We see a single &#8220;Safe Democract&#8221; precinct with very low turnout, and five &#8220;Safe Republican&#8221; precincts that run the board in turnout. Given the high number of &#8220;Tossup&#8221; precincts, and the fact that they run the gamut as far as turnout is concerned, we&#8217;ll need to incorporate additional information into our prioritization. If we also rank precincts by current voter registration, we can focus on precincts where we stand to gain the most ground. </p>
<p>Before we continue, we need to make sure there is enough difference in precinct-to-precinct registration to have an impact. Let&#8217;s look at some statistics for the current registration in this district.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">mean<span style="color: #66cc66;">&#40;</span>hd013s$current.reg<span style="color: #66cc66;">&#41;</span>
sd<span style="color: #66cc66;">&#40;</span>hd013s$current.reg<span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<table>
<tr>
<th>.</th>
</tr>
<tr>
<td>2970.111</td>
</tr>
<tr>
<td>1014.072</td>
</tr>
</table>
<p>There are on average 2,970 current registered voters in each precinct, but the standard deviation is 1,014 voters. A standard deviation that high tells us we need to take into account registration if we want to focus on the precincts with 4000 people and not 1000 people. A histogram of current registration will help us clarify this finding:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">qplot<span style="color: #66cc66;">&#40;</span>current.reg, data<span style="color: #78aaac;">=</span>hd013s, geom<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;bar&quot;</span>,binwidth<span style="color: #78aaac;">=</span><span style="color: #cc66cc;">500</span>,xlab<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Current Registration&quot;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  scale_y_continuous<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;Frequency&quot;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/current-registration-hist.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/current-registration-hist.png" alt="Current registration histogram" title="Current registration histogram" width="671" height="670" class="size-full wp-image-169" /></a></p>
<p>The standard deviation was correct: we see some very small precincts and some large precincts, but the majority are somewhere in the 2000-4000 range. The difference looks to be large enough to include current registration in our ranking.</p>
<p>We need to look at the Democratic Vote Share vs Turnout % scatter plot again, but with the points scaled to the current precinct registration.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">qplot<span style="color: #66cc66;">&#40;</span>proj.turnout.percent,dem.share,size<span style="color: #78aaac;">=</span>current.reg, data<span style="color: #78aaac;">=</span>hd013s,colour<span style="color: #78aaac;">=</span>cl<span style="color: #66cc66;">&#41;</span><span style="color: #78aaac;">+</span> 
  labs<span style="color: #66cc66;">&#40;</span>x<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Projected Turnout %&quot;</span>, y<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Democratic Vote Share %&quot;</span>,colour<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Seat Type&quot;</span>,size<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Current Registration&quot;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<div id="attachment_170" class="wp-caption aligncenter" style="width: 681px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct-scatter-color-size.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct-scatter-color-size.png" alt="Democratic vote share by Turnout %" title="Democratic vote share by Turnout %" width="671" height="670" class="size-full wp-image-170" /></a><p class="wp-caption-text">Democratic vote share by Turnout %</p></div>
<p>This plot is almost complete and ready to be analyzed. The last job is to label the points with ther precinct names. Our current <b>precinct_name</b> variable is actually a unique identifier with a FIPS county code, a precinct code, and a name, and it is too long for a point label. We&#8217;ll shrink it down to just the name and then we&#8217;ll recreate the scatter plot with the label:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code"><pre class="r" style="font-family:monospace;"><span style="color: #b22222; font-style: italic;"># replace the fips code and precinct number w/ an empty string</span>
hd013s$precinct.label <span style="color: #78aaac;">&lt;-</span> sub<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;^[0-9]+ - [0-9]+ - &quot;</span>,'',as.character<span style="color: #66cc66;">&#40;</span>hd013s$precinct_name<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># plot the previous graph again but this time use precinct.label as the label</span>
ggplot<span style="color: #66cc66;">&#40;</span>hd013s, aes<span style="color: #66cc66;">&#40;</span>x<span style="color: #78aaac;">=</span>proj.turnout.percent, y<span style="color: #78aaac;">=</span>dem.share,label<span style="color: #78aaac;">=</span>precinct.label<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  geom_point<span style="color: #66cc66;">&#40;</span>aes<span style="color: #66cc66;">&#40;</span>colour<span style="color: #78aaac;">=</span>cl,size<span style="color: #78aaac;">=</span>current.reg<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  geom_text<span style="color: #66cc66;">&#40;</span>size<span style="color: #78aaac;">=</span><span style="color: #cc66cc;">2.5</span>,vjust<span style="color: #78aaac;">=</span><span style="color: #cc66cc;">1.5</span>,angle<span style="color: #78aaac;">=</span><span style="color: #cc66cc;">25</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> 
  labs<span style="color: #66cc66;">&#40;</span>x<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Projected Turnout %&quot;</span>, y<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Democratic Vote Share %&quot;</span>,colour<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Seat Type&quot;</span>,size<span style="color: #78aaac;">=</span><span style="color: #ff0000;">&quot;Current Registration&quot;</span><span style="color: #66cc66;">&#41;</span></pre></td></tr></table></div>

<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct-scatter-color-size-label.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/dem-vote-share-by-precinct-scatter-color-size-label.png" alt="dem-vote-share-by-precinct-scatter-color-size-label" title="dem-vote-share-by-precinct-scatter-color-size-label" width="671" height="670" class="aligncenter size-full wp-image-171" /></a></p>
<p>From the chart we can see that a Democrat in the HD#013 will want to focus contact efforts on the precincts in the upper-left hand corner of the plot and will want to target larger precincts before smaller. Integrating the current registration into our previous sort command leaves us with the following sort order:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">hd013s<span style="color: #66cc66;">&#91;</span>order<span style="color: #66cc66;">&#40;</span>hd013s$proj.turnout.percent,<span style="color: #78aaac;">-</span>hd013s$app.dem,hd013s$current.reg<span style="color: #66cc66;">&#41;</span>,c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>:<span style="color: #cc66cc;">2</span>,<span style="color: #cc66cc;">4</span>,<span style="color: #cc66cc;">20</span>:<span style="color: #cc66cc;">22</span><span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span></pre></td></tr></table></div>

<table>
<tr>
<th></th>
<th>precinct_name</th>
<th>proj.turnout.percent</th>
<th>current.reg</th>
<th>dem.share</th>
<th>rep.share</th>
<th>cl</th>
</tr>
<tr>
<td>25</td>
<td>153 &#8211; 409 &#8211; SUDLEY NORTH</td>
<td>0.1959</td>
<td>2497</td>
<td>0.5105</td>
<td>0.4894</td>
<td>Tossup</td>
</tr>
<tr>
<td>27</td>
<td>153 &#8211; 411 &#8211; MULLEN</td>
<td>0.2218</td>
<td>3555</td>
<td>0.5026</td>
<td>0.4973</td>
<td>Tossup</td>
</tr>
<tr>
<td>4</td>
<td>107 &#8211; 111 &#8211; BRIAR WOODS</td>
<td>0.2256</td>
<td>2288</td>
<td>0.4837</td>
<td>0.5162</td>
<td>Tossup</td>
</tr>
<tr>
<td>6</td>
<td>107 &#8211; 212 &#8211; CLAUDE MOORE PARK</td>
<td>0.2279</td>
<td>3115</td>
<td>0.5285</td>
<td>0.4714</td>
<td>Safe Dem</td>
</tr>
<tr>
<td>26</td>
<td>153 &#8211; 410 &#8211; MOUNTAIN VIEW</td>
<td>0.2319</td>
<td>3749</td>
<td>0.4945</td>
<td>0.5054</td>
<td>Tossup</td>
</tr>
<tr>
<td>16</td>
<td>153 &#8211; 110 &#8211; BUCKLAND MILLS</td>
<td>0.2448</td>
<td>3646</td>
<td>0.4891</td>
<td>0.5108</td>
<td>Tossup</td>
</tr>
<tr>
<td>13</td>
<td>153 &#8211; 106 &#8211; ELLIS</td>
<td>0.2475</td>
<td>1303</td>
<td>0.5038</td>
<td>0.4961</td>
<td>Tossup</td>
</tr>
<tr>
<td>5</td>
<td>107 &#8211; 112 &#8211; FREEDOM</td>
<td>0.2509</td>
<td>3929</td>
<td>0.5028</td>
<td>0.4971</td>
<td>Tossup</td>
</tr>
<tr>
<td>15</td>
<td>153 &#8211; 108 &#8211; VICTORY</td>
<td>0.2645</td>
<td>4874</td>
<td>0.5005</td>
<td>0.4994</td>
<td>Tossup</td>
</tr>
<tr>
<td>24</td>
<td>153 &#8211; 408 &#8211; GLENKIRK</td>
<td>0.2837</td>
<td>2175</td>
<td>0.4998</td>
<td>0.5001</td>
<td>Tossup</td>
</tr>
<tr>
<td>1</td>
<td>107 &#8211; 106 &#8211; EAGLE RIDGE</td>
<td>0.2856</td>
<td>2531</td>
<td>0.4992</td>
<td>0.5007</td>
<td>Tossup</td>
</tr>
<tr>
<td>18</td>
<td>153 &#8211; 112 &#8211; CEDAR POINT</td>
<td>0.2876</td>
<td>3497</td>
<td>0.4855</td>
<td>0.5144</td>
<td>Tossup</td>
</tr>
<tr>
<td>14</td>
<td>153 &#8211; 107 &#8211; MARSTELLER</td>
<td>0.3067</td>
<td>3669</td>
<td>0.4775</td>
<td>0.5224</td>
<td>Safe Rep</td>
</tr>
<tr>
<td>3</td>
<td>107 &#8211; 109 &#8211; HUTCHISON</td>
<td>0.3168</td>
<td>3722</td>
<td>0.4857</td>
<td>0.5142</td>
<td>Tossup</td>
</tr>
<tr>
<td>2</td>
<td>107 &#8211; 108 &#8211; MERCER</td>
<td>0.3281</td>
<td>3229</td>
<td>0.5034</td>
<td>0.4965</td>
<td>Tossup</td>
</tr>
<tr>
<td>17</td>
<td>153 &#8211; 111 &#8211; BRISTOW RUN</td>
<td>0.3324</td>
<td>3031</td>
<td>0.4822</td>
<td>0.5177</td>
<td>Tossup</td>
</tr>
<tr>
<td>23</td>
<td>153 &#8211; 406 &#8211; ALVEY</td>
<td>0.3460</td>
<td>4403</td>
<td>0.4736</td>
<td>0.5263</td>
<td>Safe Rep</td>
</tr>
<tr>
<td>21</td>
<td>153 &#8211; 402 &#8211; BATTLEFIELD</td>
<td>0.3546</td>
<td>3851</td>
<td>0.4323</td>
<td>0.5676</td>
<td>Safe Rep</td>
</tr>
<tr>
<td>10</td>
<td>153 &#8211; 102 &#8211; BENNETT</td>
<td>0.3896</td>
<td>4440</td>
<td>0.4959</td>
<td>0.5040</td>
<td>Tossup</td>
</tr>
<tr>
<td>19</td>
<td>153 &#8211; 209 &#8211; WOODBINE</td>
<td>0.4014</td>
<td>2406</td>
<td>0.4651</td>
<td>0.5348</td>
<td>Safe Rep</td>
</tr>
<tr>
<td>7</td>
<td>107 &#8211; 307 &#8211; MIDDLEBURG</td>
<td>0.4043</td>
<td>1239</td>
<td>0.4953</td>
<td>0.5046</td>
<td>Tossup</td>
</tr>
<tr>
<td>9</td>
<td>153 &#8211; 101 &#8211; BRENTSVILLE</td>
<td>0.4180</td>
<td>1708</td>
<td>0.4904</td>
<td>0.5095</td>
<td>Tossup</td>
</tr>
<tr>
<td>22</td>
<td>153 &#8211; 403 &#8211; BULL RUN</td>
<td>0.4226</td>
<td>3111</td>
<td>0.4860</td>
<td>0.5139</td>
<td>Tossup</td>
</tr>
<tr>
<td>20</td>
<td>153 &#8211; 401 &#8211; EVERGREEN</td>
<td>0.4283</td>
<td>2535</td>
<td>0.5006</td>
<td>0.4993</td>
<td>Tossup</td>
</tr>
<tr>
<td>12</td>
<td>153 &#8211; 104 &#8211; NOKESVILLE</td>
<td>0.4537</td>
<td>2501</td>
<td>0.4960</td>
<td>0.5039</td>
<td>Tossup</td>
</tr>
<tr>
<td>11</td>
<td>153 &#8211; 103 &#8211; BUCKHALL</td>
<td>0.4636</td>
<td>2287</td>
<td>0.4773</td>
<td>0.5226</td>
<td>Safe Rep</td>
</tr>
<tr>
<td>8</td>
<td>107 &#8211; 309 &#8211; ALDIE</td>
<td>0.4687</td>
<td>902</td>
<td>0.4881</td>
<td>0.5118</td>
<td>Tossup</td>
</tr>
</table>
<p>Now that we have our ranking, we can figure out how much each precinct might offer. Let&#8217;s first see the number of votes required to win the seat, the number of votes we&#8217;re projected to receive given the calculated APP, previous turnout, and current registration. The <b>district.summary</b> function will provide us will all this information:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">district.summary<span style="color: #66cc66;">&#40;</span>hd013s<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#91;</span>,c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>,<span style="color: #cc66cc;">2</span>,<span style="color: #cc66cc;">9</span>,<span style="color: #cc66cc;">10</span>,<span style="color: #cc66cc;">11</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span></pre></td></tr></table></div>

<table>
<th>
<td>current.reg</td>
<td>proj.turnout.count</td>
<td>votes.to.win</td>
<td>proj.turnout.rep</td>
<td>proj.turnout.dem</td>
</th>
<tr>
<td>1</td>
<td>80193</td>
<td>25401</td>
<td>12701.5</td>
<td>12499</td>
<td>12074</td>
</tr>
</table>
<p>We can see that the projected turnout (<b><i>proj.turnout.count</i></b>) is about 25,401, so the votes projected to win this district is only 12,702. Using the  Democratic APP, we can project Democratic turnout at 12,074, so we need to find 628 votes to win. How do we find these votes? </p>
<p>Lets go back to our sorted precinct list and take the top 30% and call them our <b><i>target.precincts</b></i>.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">sorted.precincts <span style="color: #78aaac;">&lt;-</span> hd013s<span style="color: #66cc66;">&#91;</span>order<span style="color: #66cc66;">&#40;</span>hd013s$proj.turnout.percent,<span style="color: #78aaac;">-</span>hd013s$app.dem,hd013s$current.reg<span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span>
target.precincts <span style="color: #78aaac;">&lt;-</span> sorted.precincts<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1</span>:<span style="color: #66cc66;">&#40;</span>nrow<span style="color: #66cc66;">&#40;</span>sorted.precincts<span style="color: #66cc66;">&#41;</span><span style="color: #78aaac;">/</span><span style="color: #cc66cc;">3</span><span style="color: #66cc66;">&#41;</span>,<span style="color: #66cc66;">&#93;</span></pre></td></tr></table></div>

<p>We&#8217;ve got our target list, and we know we need 628 votes from them to bring our total to 50% + 1. Adding a small buffer to that number, we&#8217;ll take 640 target votes and allocate them across our target precincts, proportional to the number of registered voters in the precinct. Hopefully, this will set more realistic goals for larger and smaller precincts.</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code"><pre class="r" style="font-family:monospace;">target.precincts$inc <span style="color: #78aaac;">&lt;-</span> as.integer<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">640</span> <span style="color: #78aaac;">*</span> target.precincts$current.reg<span style="color: #78aaac;">/</span>sum<span style="color: #66cc66;">&#40;</span>target.precincts$current.reg<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
&nbsp;
target.precincts<span style="color: #66cc66;">&#91;</span>,c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span>,<span style="color: #cc66cc;">3</span>,<span style="color: #cc66cc;">17</span>,<span style="color: #cc66cc;">23</span>,<span style="color: #cc66cc;">18</span>,<span style="color: #cc66cc;">20</span>:<span style="color: #cc66cc;">22</span>,<span style="color: #cc66cc;">24</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#93;</span></pre></td></tr></table></div>

<table>
<tr>
<th>precinct.label</th>
<th>proj.turnout.percent</th>
<th>proj.turnout.count</th>
<th>proj.turnout.dem</th>
<th>proj.turnout.rep</th>
<th>dem.share</th>
<th>rep.share</th>
<th>cl</th>
<th>inc</th>
</tr>
<tr>
<td>SUDLEY NORTH</td>
<td>0.1959</td>
<td>489</td>
<td>248</td>
<td>238</td>
<td>0.5105</td>
<td>0.4894</td>
<td>Tossup</td>
<td>55</td>
</tr>
<tr>
<td>MULLEN</td>
<td>0.2218</td>
<td>788</td>
<td>391</td>
<td>387</td>
<td>0.5026</td>
<td>0.4973</td>
<td>Tossup</td>
<td>78</td>
</tr>
<tr>
<td>BRIAR WOODS</td>
<td>0.2256</td>
<td>516</td>
<td>243</td>
<td>259</td>
<td>0.4837</td>
<td>0.5162</td>
<td>Tossup</td>
<td>50</td>
</tr>
<tr>
<td>CLAUDE MOORE PARK</td>
<td>0.2279</td>
<td>709</td>
<td>366</td>
<td>326</td>
<td>0.5285</td>
<td>0.4714</td>
<td>Safe Dem</td>
<td>68</td>
</tr>
<tr>
<td>MOUNTAIN VIEW</td>
<td>0.2319</td>
<td>869</td>
<td>427</td>
<td>437</td>
<td>0.4945</td>
<td>0.5054</td>
<td>Tossup</td>
<td>82</td>
</tr>
<tr>
<td>BUCKLAND MILLS</td>
<td>0.2448</td>
<td>892</td>
<td>431</td>
<td>450</td>
<td>0.4891</td>
<td>0.5108</td>
<td>Tossup</td>
<td>80</td>
</tr>
<tr>
<td>ELLIS</td>
<td>0.2475</td>
<td>322</td>
<td>160</td>
<td>158</td>
<td>0.5038</td>
<td>0.4961</td>
<td>Tossup</td>
<td>28</td>
</tr>
<tr>
<td>FREEDOM</td>
<td>0.2509</td>
<td>986</td>
<td>492</td>
<td>487</td>
<td>0.5028</td>
<td>0.4971</td>
<td>Tossup</td>
<td>86</td>
</tr>
<tr>
<td>VICTORY</td>
<td>0.2645</td>
<td>1289</td>
<td>638</td>
<td>637</td>
<td>0.5005</td>
<td>0.4994</td>
<td>Tossup</td>
<td>107</td>
</tr>
</table>
<p>The final column in the result is the target increase for that precinct (column: &#8216;inc&#8217;). With this information in hand the campaign field operations can devise a contact strategy to bring these voters to the polls on election day. </p>
<h2>Conclusion</h2>
<p>Playing the role of campaign consultant, we have analyzed previous electoral outcomes in the 13th seat of the House of Delegates in Virginia. We have shown how a Democratic candidate can leverage increasing Democratic support and low turnout to make this race competitive. We have also created a precinct targeting methodology that provides a high-level blueprint for resources planning. The analysis we performed performed is very standard, but using R makes our methodology unique. A down-ballot or primary-challenger campaign taking advantage of this methodology will spend less money and can experiment more on their targeting, potentially leading them to a win.</p>
<p>Are you a Democrat running for the Virginia House of Delegates who would like to see the same data for your race? Or, are you a Democratic congressional candidate preparing for the 2010 cycle? Contact me at <a href="mailto:jjh@offensivepolitics.net">jjh@offensivepolitics.net</a> for robust targeting data or other analysis.</p>
<p>Follow Offensive Politics <a target="_blank" href="http://twitter.com/offpol/">on twitter</a></p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/?feed=rss2&amp;p=113</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>&#8220;I&#8217;m a Republican because&#8230;&#8221;, visualized with R</title>
		<link>http://offensivepolitics.net/blog/?p=212</link>
		<comments>http://offensivepolitics.net/blog/?p=212#comments</comments>
		<pubDate>Thu, 15 Oct 2009 17:21:39 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[GOP.com]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=212</guid>
		<description><![CDATA[Visualizing user-generated statements from GOP.com to the theme of "I'm a Republican because...", using R.]]></description>
			<content:encoded><![CDATA[<p>The GOP recently relaunched its <a href="http://www.gop.com/" target="_blank">main web site</a> with a new design and numerous interactive and social features like Facebook integration, blogs, etc. Of particular interest is the <a href="http://www.gop.com/index.php/learn/republican_faces/" target="_blank">GOP Faces</a> section, which asks users to submit a photo and answer the question &#8220;Why are you a Republican?&#8221; Not being a Republican, I was curious to see if there were any common themes among the submissions that would lead to insights about being a Republican and GOP.com user. Not excited about actually reading all 180 reasons, I instead used <a href="http://www.r-project.org/" target="_blank">R</a> to download, transform, analyze and visualize the data for me. </p>
<p>I used several packages (<a href="http://cran.r-project.org/web/packages/XML/index.html" target="_blank">XML</a> and <a href="http://cran.r-project.org/web/packages/plyr/index.html" target="_blank">plyr</a>) to fetch and extract reasons, and then <a href="http://cran.r-project.org/web/packages/tm/index.html" target="_blank">tm</a> to filter stop words and identify commonly used terms. Finally, I used <a href="http://cran.r-project.org/web/packages/ggplot2/index.html" target="_blank">ggplot2</a>, the invaluable <a href="http://www.amazon.com/ggplot2-Elegant-Graphics-Data-Analysis/dp/0387981403/" target="_blank">ggplot2 blook</a>, and <a href="http://www.mail-archive.com/r-help@r-project.org/msg58173.html" target="_blank">a helpful post</a> from the <a href="https://stat.ethz.ch/mailman/listinfo/r-help" target="_blank">R-help</a> mailing list to perform the visualization. </p>
<p>R code</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;">library<span style="color: #66cc66;">&#40;</span>XML<span style="color: #66cc66;">&#41;</span>
library<span style="color: #66cc66;">&#40;</span>plyr<span style="color: #66cc66;">&#41;</span>
library<span style="color: #66cc66;">&#40;</span>ggplot2<span style="color: #66cc66;">&#41;</span>
library<span style="color: #66cc66;">&#40;</span>tm<span style="color: #66cc66;">&#41;</span>
&nbsp;
<span style="color: #b22222; font-style: italic;"># fetch &amp; parse the HTML</span>
doc <span style="color: #78aaac;">&lt;-</span> htmlParse<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;http://gop.com/index.php/learn/republican_faces/&quot;</span>,isURL <span style="color: #78aaac;">=</span> <span style="color:#228b22;">TRUE</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># pull the matching A elements of CSS class tipz</span>
nodes <span style="color: #78aaac;">&lt;-</span> getNodeSet<span style="color: #66cc66;">&#40;</span>doc, <span style="color: #ff0000;">&quot;//a[@class='tipz']&quot;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># extract the 'title' attribute </span>
titles <span style="color: #78aaac;">&lt;-</span> sapply<span style="color: #66cc66;">&#40;</span>nodes, <span style="color: #a020f0;">function</span><span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> xmlAttrs<span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#91;</span><span style="color: #ff0000;">&quot;title&quot;</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># clean up the title attribute </span>
titles <span style="color: #78aaac;">&lt;-</span> sub<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;^[^:]+::&quot;</span>,<span style="color: #ff0000;">&quot;&quot;</span>,titles<span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># create the corpus and doc term matrix</span>
co <span style="color: #78aaac;">&lt;-</span> Corpus<span style="color: #66cc66;">&#40;</span>VectorSource<span style="color: #66cc66;">&#40;</span>titles<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
tdm <span style="color: #78aaac;">&lt;-</span> DocumentTermMatrix<span style="color: #66cc66;">&#40;</span>co, control<span style="color: #78aaac;">=</span>list<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;tolower&quot;</span>, removeNumbers<span style="color: #78aaac;">=</span><span style="color:#228b22;">TRUE</span>, stopwords<span style="color: #78aaac;">=</span><span style="color:#228b22;">TRUE</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #b22222; font-style: italic;"># extract the tags at each level</span>
levels <span style="color: #78aaac;">&lt;-</span> c<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>,<span style="color: #cc66cc;">2</span>,<span style="color: #cc66cc;">3</span>,<span style="color: #cc66cc;">4</span><span style="color: #66cc66;">&#41;</span>
df <span style="color: #78aaac;">&lt;-</span> ldply<span style="color: #66cc66;">&#40;</span>levels, <span style="color: #a020f0;">function</span><span style="color: #66cc66;">&#40;</span>x<span style="color: #66cc66;">&#41;</span> data.frame<span style="color: #66cc66;">&#40;</span>freq<span style="color: #78aaac;">=</span>x,term<span style="color: #78aaac;">=</span>findFreqTerms<span style="color: #66cc66;">&#40;</span>tdm,x,x<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> 
<span style="color: #b22222; font-style: italic;">#assign random non-repeating coordinates to the terms</span>
df$x <span style="color: #78aaac;">&lt;-</span> sample<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">1</span>:nrow<span style="color: #66cc66;">&#40;</span>df<span style="color: #66cc66;">&#41;</span>,nrow<span style="color: #66cc66;">&#40;</span>df<span style="color: #66cc66;">&#41;</span>, replace<span style="color: #78aaac;">=</span>F<span style="color: #66cc66;">&#41;</span>
df$y <span style="color: #78aaac;">&lt;-</span> df$freq <span style="color: #78aaac;">+</span> rnorm<span style="color: #66cc66;">&#40;</span>nrow<span style="color: #66cc66;">&#40;</span>df<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>
&nbsp;
<span style="color: #b22222; font-style: italic;"># clear standard graph options (thanks mike lawrence on r-help)</span>
clear <span style="color: #78aaac;">&lt;-</span> opts<span style="color: #66cc66;">&#40;</span>
         legend.position <span style="color: #78aaac;">=</span> 'none'
         , panel.grid.minor <span style="color: #78aaac;">=</span> theme_blank<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
         , panel.grid.major <span style="color: #78aaac;">=</span> theme_blank<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
         , panel.background <span style="color: #78aaac;">=</span> theme_blank<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
         , axis.line <span style="color: #78aaac;">=</span> theme_blank<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
         , axis.text.x <span style="color: #78aaac;">=</span> theme_blank<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
         , axis.text.y <span style="color: #78aaac;">=</span> theme_blank<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
         , axis.ticks <span style="color: #78aaac;">=</span> theme_blank<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
         , axis.title.x <span style="color: #78aaac;">=</span> theme_blank<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
         , axis.title.y <span style="color: #78aaac;">=</span> theme_blank<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>
 <span style="color: #66cc66;">&#41;</span>
&nbsp;
p <span style="color: #78aaac;">&lt;-</span> ggplot<span style="color: #66cc66;">&#40;</span>df,aes<span style="color: #66cc66;">&#40;</span>x<span style="color: #78aaac;">=</span>x,y<span style="color: #78aaac;">=</span>y,colour<span style="color: #78aaac;">=</span>freq,label<span style="color: #78aaac;">=</span>term,size<span style="color: #78aaac;">=</span>freq<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> geom_text<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span> <span style="color: #78aaac;">+</span> coord_polar<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span><span style="color: #78aaac;">+</span> clear 
ggsave<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;because.png&quot;</span>,p,dpi<span style="color: #78aaac;">=</span><span style="color: #cc66cc;">72</span>,scale<span style="color: #78aaac;">=</span><span style="color: #cc66cc;">1.3</span><span style="color: #66cc66;">&#41;</span>
ggsave<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;because.pdf&quot;</span>, p<span style="color: #66cc66;">&#41;</span></pre></div></div>

<p>And the output: </p>
<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/because.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2009/10/because.png" alt="I'm a Republican because..." title="I'm a Republican because..." width="655" height="654" class="aligncenter size-full wp-image-213" /></a></p>
<p>Click <a href='http://offensivepolitics.net/blog/wp-content/uploads/2009/10/because.pdf'>for a page-sized PDF</a>, or the raw <a href='http://offensivepolitics.net/blog/wp-content/uploads/2009/10/terms.csv'>terms and frequency counts</a>.</p>
<p>The most common term is &#8216;freedom&#8217;, followed by &#8216;equal&#8217;, and &#8216;pro&#8217;. After those come &#8216;personal&#8217;, &#8216;government&#8217;, &#8216;people&#8217;, &#8216;school&#8217;, &#8216;family&#8217;, and &#8216;believe&#8217;. A more robust analysis could use term extraction (pro family, pro life, anti government) or stemming, and then feed the results into a better visualization. That would take more than the 10 minutes I spent so far, so I&#8217;m leaving that as an exercise to somebody else.</p>
<p>As it is I have the most common answer as to why GOP.com visitors are Republicans: <b>freedom</b>. I think that&#8217;s probably why anybody belongs to any political party, but without a corpus from other parties I suppose we&#8217;ll never know. </p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/?feed=rss2&amp;p=212</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>open-source campaign finance analysis with R and MySQL</title>
		<link>http://offensivepolitics.net/blog/?p=52</link>
		<comments>http://offensivepolitics.net/blog/?p=52#comments</comments>
		<pubDate>Thu, 18 Jun 2009 21:12:33 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Open-Source]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Campaign Finance]]></category>
		<category><![CDATA[FEC]]></category>
		<category><![CDATA[fechell]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=52</guid>
		<description><![CDATA[Introduction In Part 1 of this tutorial we introduced the fechell library by extracting all itemized contributions from individuals made to the Obama For America campaign in 2007 and 2008. In Part 2 of the tutorial we will summarize that data set by importing it into a MySQL database and aggregating contributions by week and [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>In <a href="http://offensivepolitics.net/blog/?p=26">Part 1</a> of this tutorial we introduced the <strong><a href="http://offensivepolitics.net/fechell">fechell</a></strong> library by extracting all itemized contributions from individuals made to the <em>Obama For America</em> campaign in 2007 and 2008. In Part 2 of the tutorial we will summarize that data set by importing it into a MySQL database and aggregating contributions by week and zip code. Next we&#8217;ll visualize the contribution data amounts on a map, week by week. Below is a sample image: </p>
<p><img src="http://offensivepolitics.net/blog/wp-content/uploads/2009/06/o4a_2008_30-300x225.png" alt="Obama for America itemized fundraising, 2008-30." title="Obama for America itemized fundraising, 2008-30." width="300" height="225" class="aligncenter size-medium wp-image-91" /></p>
<p>This visualization contains two separate measures: the top portion is a map of the continental US, marked with a dot at the geographic center of every zip code that had individuals who made contributions to <em>Obama for America</em>. The dot is colored according to how much money in total was raised from that zip code since the 2nd week of 2007, with the colors going from a dark blue to bright red indicating amounts. The second measure is a vertical bar plot showing how much money was raised, per week, from all zip codes combined. The current week is highlighted and annotated with the amount raised for that week. Taken together these measures can show us where the first monetary support for the <em>Obama for America</em> campaign came from and how it progressed geographically and in volume. We will use the free <a href="http://www.r-project.org/">R</a> statistics and visualization package to producing the weekly image we shown above. Looking at still images week by week is informative but not very exciting. After all the images are created we&#8217;ll use <a href="http://www.mplayerhq.hu/design7/news.html">MEncoder</a> to string together the images into a movie to demonstrate the growth of individual financial support pledged to <em>Obama for America</em>.</p>
<h2>Installing the database</h2>
<p>To follow along with this exercise you will need to <a href="http://dev.mysql.com/downloads/mysql/5.1.html#downloads">download</a> MySQL 5.1 for your platform. English instructions for installation are available in the <a href="http://dev.mysql.com/doc/refman/5.1/en/installing.html">MySQL 5.1 Reference Manual</a>. After installation you will also need to create a database user and grant rights to create databases, create indexes and perform <b>LOAD DATA LOCAL</b> functions. All of these administration tasks are covered in the MySQL 5.1 Reference Manual <a href="http://dev.mysql.com/doc/refman/5.1/en/user-account-management.html">Section 5.5 (MySQL User Account Management)</a>. Keep the user name and password of the account you created handy as we&#8217;ll use it in the next few steps to access the database. </p>
<h2>Creating the database</h2>
<p>We will be using two different tables to represent the itemized individual contributions: the <b>transactions</b> table will hold all 2.8 million transaction values, and the <b>transactions_summary_weekly</b> table will hold the sum of all contributions for every combination of year, week, and 5 digit zip code that exists in the data. </p>
<p>To populate these tables, download and save the transaction table creation script <b>create-transactions.sql</b> available in the <a href="#resources">Resources</a> sub-section below to a directory. Next download the output of the first part of the post into that same directory (<b>obama-data-F3P-2007-2008.csv</b>), available as ZIP archive in <a href="#resources">Resources</a> sub-section below. If this CSV file isn&#8217;t in the same directory as the <b>create-transactions.sql</b> script you&#8217;ll need to change line #4 to reference the exact location. </p>
<p><b>create-transactions.sql</b> looks like this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">DATABASE</span> fechell;
<span style="color: #993333; font-weight: bold;">USE</span> fechell;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> transactions <span style="color: #66cc66;">&#40;</span>contribution_date date<span style="color: #66cc66;">,</span>contribution_amount int<span style="color: #66cc66;">,</span> zipcode char<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">5</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span> engine <span style="color: #ff0000;">'MyISAM'</span>;
<span style="color: #993333; font-weight: bold;">LOAD</span> <span style="color: #993333; font-weight: bold;">DATA</span> <span style="color: #993333; font-weight: bold;">LOCAL</span> <span style="color: #993333; font-weight: bold;">INFILE</span> <span style="color: #ff0000;">'obama-data-F3P-2007-2008.csv'</span> <span style="color: #993333; font-weight: bold;">INTO</span> <span style="color: #993333; font-weight: bold;">TABLE</span> transactions <span style="color: #993333; font-weight: bold;">FIELDS</span> terminated <span style="color: #993333; font-weight: bold;">BY</span> <span style="color: #ff0000;">','</span> <span style="color: #993333; font-weight: bold;">LINES</span> terminated <span style="color: #993333; font-weight: bold;">BY</span> <span style="color: #ff0000;">'<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>'</span> <span style="color: #993333; font-weight: bold;">IGNORE</span> <span style="color: #cc66cc;">1</span> <span style="color: #993333; font-weight: bold;">LINES</span> <span style="color: #66cc66;">&#40;</span>contribution_date<span style="color: #66cc66;">,</span>contribution_amount<span style="color: #66cc66;">,</span>zipcode<span style="color: #66cc66;">&#41;</span>;
<span style="color: #993333; font-weight: bold;">ALTER</span> <span style="color: #993333; font-weight: bold;">TABLE</span> transactions <span style="color: #993333; font-weight: bold;">ADD</span> <span style="color: #993333; font-weight: bold;">INDEX</span>  ix_zipcode <span style="color: #66cc66;">&#40;</span>zipcode<span style="color: #66cc66;">&#41;</span>;</pre></td></tr></table></div>

<p>We are simply creating a new database (&#8216;fechell&#8217;), and creating a new MyISAM table called &#8216;transactions&#8217; on that database. We chose MyISAM since large data imports are dramatically faster with MyISAM than the default InnoDB, at least in the default setup we&#8217;re using.  You can execute this script by running the following command in the directory where you saved <b>create-transactions.sql</b> and <b>obama-data-F3P-2007-2008.csv</b>. Replace YOURDBUSERNAME with the user you created during the MySQL installation.</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">mysql <span style="color: #660033;">-uYOURDBUSERNAME</span> <span style="color: #000000; font-weight: bold;">&lt;</span> create-transactions.sql</pre></div></div>

<p>Since we are loading in 2.8 million records and then adding an index to the zip code field this command might take quite a while to execute depending on your processor speed. After it is finished you should have a large database of all individual itemized transactions in the <b>fechell.transactions</b> table. </p>
<p>Next we will create and populate the <b>transactions_summary_weekly</b> table, which will contain aggregated itemized individual fund raising totals by zip code by week and year.<br />
The script <b>create-transactions_summary_weekly.sql</b> available in the <a href="#resources">Resources</a> sub-section below looks like this:</p>

<div class="wp_syntax"><table><tr><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">USE</span> fechell;
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span>  transactions_summary_weekly <span style="color: #66cc66;">&#40;</span>zipcode char<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">5</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>contribution_year int<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">10</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span>contribution_week int<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">10</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">,</span> total int<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">10</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">UNSIGNED</span> <span style="color: #993333; font-weight: bold;">NOT</span> <span style="color: #993333; font-weight: bold;">NULL</span><span style="color: #66cc66;">&#41;</span> engine <span style="color: #ff0000;">'MyISAM'</span>;
<span style="color: #993333; font-weight: bold;">INSERT</span> <span style="color: #993333; font-weight: bold;">INTO</span> transactions_summary_weekly<span style="color: #66cc66;">&#40;</span>zipcode<span style="color: #66cc66;">,</span>contribution_year<span style="color: #66cc66;">,</span>contribution_week<span style="color: #66cc66;">,</span>total<span style="color: #66cc66;">&#41;</span> 
<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> zipcode<span style="color: #66cc66;">,</span>year<span style="color: #66cc66;">&#40;</span>contribution_date<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span>week<span style="color: #66cc66;">&#40;</span>contribution_date<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span>sum<span style="color: #66cc66;">&#40;</span>contribution_amount<span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">FROM</span> transactions <span style="color: #993333; font-weight: bold;">GROUP</span> <span style="color: #993333; font-weight: bold;">BY</span> zipcode<span style="color: #66cc66;">,</span>year<span style="color: #66cc66;">&#40;</span>contribution_date<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span>week<span style="color: #66cc66;">&#40;</span>contribution_date<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;</pre></td></tr></table></div>

<p>You can run the script with the following command:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">mysql <span style="color: #660033;">-uYOURDBUSERNAME</span> <span style="color: #000000; font-weight: bold;">&lt;</span> create-transactions_summary_weekly.sql</pre></div></div>

<p>After the script is finished executing you should have two rather large tables populated with the information we&#8217;ll need to visualize the data. </p>
<h2>Installing R</h2>
<p>Before we can create the visualizations we&#8217;ll need to make sure the <a href="http://www.r-project.org/">R</a> toolkit and several add-on packages are installed correctly. If it isn&#8217;t installed already, you can download an R installer from <a href="http://cran.r-project.org/mirrors.html">the R mirrors list</a>. The <a href="http://cran.r-project.org/faqs.html">R Frequently Asked Questions</a> page contains instructions for Linux, Unix, Mac, and Windows platforms. </p>
<h2>Installing R Packages</h2>
<p> After R is installed you will need to install several add-on packages that may or may not be included with your distribution: <b>sp</b>, <b>maps</b>,<b>maptools</b>, and <b>RMySQL</b>. All are available via CRAN and platform-dependent instructions for installing add-on packages fcan be found on the <a href="http://wiki.r-project.org/rwiki/doku.php">R Wiki</a> under <a href="http://wiki.r-project.org/rwiki/doku.php?id=getting-started:installation:packages">How do I install a package?</a>. </p>
<h2>Creating the visualization part 1 &#8211; images</h2>
<p>Before you can create the visualization you will need to download the R script to draw the frames, and a support file containing latitude and longitude pairs for every zip-code in the US. Both are available in the <a href="#references">References</a> sub-section below as <b>draw.R</b> and <b>zips.zip</b>. Unzip the <b>zips.zip</b> file and move the newly created <b>zips.csv</b> file into the same directory as <b>draw.R</b>. You will need to make 1 change to the <b>draw.R</b> script before you can run it: line 37 creates the connection to the database we created and populated earlier. You&#8217;ll need to change the host, user name, and password arguments to match your database setup. </p>
<p>The draw.R file is pretty simple, so we wont walk through the code line by line. Most of the heavy lifting is done by the fantastic <b>sp</b> and <b>RMySQL</b> packages, with a little bit of help from a lat/long database of zip codes. </p>
<p>Once you&#8217;ve edited the <b>draw.R</b> to match your configuration you can can create the images with the following command line, assuming the R bin directory has been correctly added to your PATH:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">R CMD BATCH draw.R</pre></div></div>

<p>Assuming the packages were installed correctly, the <b>zips.csv</b> file was in the same directory, and the database configuration was modified you should now see 96 PNG files in your directory. The files are named <b>o4a_<i>year</i>_<i>week</i>.png</b>. </p>
<h2>Creating the visualization part 2 &#8211; video</h2>
<p>To create a FLV movie from the weekly images you will need to install <a href="http://www.mplayerhq.hu/">MPlayer/MEncoder</a> if it isn&#8217;t installed on your system already. You can find download and install instructions on the <a href="http://www.mplayerhq.hu/design7/dload.html">MPlayer download page</a>. Once the installation is complete you can create a FLV movie with the following command line (some options taken from a page on <a href="http://www.jeremychapman.info/cms/mencoder-<br />
avi-to-flv-conversion">http://jeremychaman.info/</a></p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">mencoder.exe mf:<span style="color: #000000; font-weight: bold;">//*</span>.png <span style="color: #660033;">-mf</span> <span style="color: #007800;"><span style="color: #c20cb9; font-weight: bold;">w</span></span>=<span style="color: #000000;">1024</span>:<span style="color: #007800;">h</span>=<span style="color: #000000;">768</span>:<span style="color: #007800;"><span style="color: #7a0874; font-weight: bold;">type</span></span>=png <span style="color: #660033;">-ovc</span> lavc <span style="color: #660033;">-lavcopts</span> <span style="color: #007800;">vcodec</span>=flv:<span style="color: #007800;">mbd</span>=<span style="color: #000000;">2</span>:mv0:v4mv:cbp:<span style="color: #007800;">last_pred</span>=<span style="color: #000000;">3</span>:trell:<span style="color: #007800;">keyint</span>=<span style="color: #000000;">50</span>:<span style="color: #007800;">vbitrate</span>=<span style="color: #000000;">300</span> <span style="color: #660033;">-o</span> output.flv <span style="color: #660033;">-fps</span> <span style="color: #000000;">2</span> <span style="color: #660033;">-ofps</span> <span style="color: #000000;">5</span> <span style="color: #660033;">-of</span> lavf</pre></div></div>

<p>After the command is complete you will have a FLV file called <b>output.flv</b>. The final file is embedded below, and available in the <a name="resources">Resources</a> section: </p>
[See post to watch Flash video]
<h2>Problems with this visualization</h2>
<p>While this visualization serves our purpose of tracking the geographic spread and growing amount of individual contributions to the <em>Obama for America</em> campaign, it is not perfect. The entire process, from extraction to visualization was created in about half a day, so there is obviously more work that could be done. Also the specific process used in these tutorials was to be a demonstration of several strategies to perform your own analysis, not necessarily to build the best visualization of independent contributions.</p>
<p>One problem with the current visualization is the representation of an entire zip code by a dot at its geographic center. This is problematic because zip codes represent regions that are vastly different in size, so using single dots will end up in decisions being skewed. Based on our maps alone, we could conclude <em>Obama for America</em> enjoyed very little support in places like Montana and Colorado since they have very few dots within their borders. In truth the campaign received more than $11M  of support from Colorado and just over $1.2M in itemized donations from Montana, but this is very difficult to compare this to Philadelphia or New York City where the map is a smear of dots. A better way to display information by zip code could involve drawing the <a href="http://www.census.gov/geo/www/cob/z52000.html">ZIP boundaries</a> and filling with a color, instead of just coloring a single point. This would work fine for places like Montana, Idaho, and Colorado where a zip code might refer to a very large area, but would fail in more populous areas. This is especially true in New York City, where a single zip code could represent an area as small as a single square mile and would be almost impossible to view on a reasonably sized graphic. Merging of zip code areas and averaging their fund raising totals could make this procedure more useful for national analysis or local analysis of heavily populated metro areas. </p>
<p>Additionally, the output format of the visualization could be improved by making it interactive. Combining our weekly summary data with a tool like the <a href="http://www.google.com/ig/directory?url=www.google.com/ig/modules/motionchart.xml">Google Motion Chart Gadget</a> along with an <a href="http://www.google.com/ig/directory?url=www.google.com/ig/modules/time-series-line.xml">Animated Timeline</a> would provide a much greater user experience.</p>
<p>Finally, the national analysis of <em>Obama for America</em> is interesting by itself but would be much more useful if it included data from other candidates. Being able to overlay zip code summaries from primary challengers like <em>Hillary Clinton for President</em>, and general election opponent <em>McCain-Pailn 2008 Inc</em> would allow a much greater depth of analysis to be performed. Including demographic information, primary dates, and election returns on the same visualization would be the best.</p>
<p>I intend to address these topics -as well as analyzing congressional races-  in a future tutorial. </p>
<h2>What the visualization tells us</h2>
<p>Despite the flaws listed above after viewing the final output of our visualization we can draw several conclusions about the fund raising success of the <em>Obama for America</em> campaign. First, the campaign was truly national and had financial support from across the country by the end of the campaign. But the campaign didn&#8217;t start out national &#8211; during the critical first two months of the campaign <em>Obama for America</em> drew financial support from several metro areas including Chicago, New York City, and DC/Northern Virginia. After several months of heavy development of the grassroots network the campaign started seeing donations coming in from across the US. </p>
<p>Using the weekly summary data we can try to attribute large spikes in total receipts to campaign events. Looking closely we see the campaign saw a large increase of contributions in the last week of March, which potentially could be attributed to popular support of candidate Obama&#8217;s <a href="http://en.wikipedia.org/wiki/A_More_Perfect_Union_(speech)">More Perfect Union speech</a> or as a result of the large new donor push at the beginning of that month. We can also look at weeks 2008/4 and 2008/5(beginning of February 2008) and see week to week receipts jump by $5M to around $11M for two full weeks. This spike coincides with the <a href="http://en.wikipedia.org/wiki/Super_Tuesday#Democrats">Super Tuesday</a> Democratic primaries where candidate Obama nearly split the day with then-front runner Hillary Clinton. </p>
<h2>Conclusion</h2>
<p>Using  free disclosure data and a few open-source tools, like <a href="http://www.ruby-lang.org/">Ruby</a>, <a href="http://offensivepolitics.net/fechell">fechell</a>, <a href="http://www.r-project.org/">R</a>, and <a href="http://www.mysql.com">MySQL</a> just about anybody can perform their own professional analysis on federal campaign finance data. </p>
<h2><a name="resources">Resources</a></h2>
<p><a href="http://offensivepolitics.net/assets/create-transactions.sql">Database creation script</a><br />
<a href="http://offensivepolitics.net/assets/create-transactions_summary_weekly.sql">Database population script</a><br />
<a href="http://offensivepolitics.net/assets/draw.R">R script to draw images draw.R</a><br />
<a href="http://offensivepolitics.net/assets/zips.zip">Zip codes</a><br />
<a href="http://offensivepolitics.net/assets/output.flv">FLV file</a><br />
<a href="http://offensivepolitics.net/assets/obama-F3P-SA-2007-2008.zip">CSV extract</a> from <a href="http://offensivepolitics.net/blog/?p=26"">Part 1</a></p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/?feed=rss2&amp;p=52</wfw:commentRss>
		<slash:comments>1</slash:comments>
<enclosure url="http://offensivepolitics.net/assets/output.flv" length="902055" type="video/x-flv" />
		</item>
		<item>
		<title>open-source campaign finance analysis with ruby and fechell</title>
		<link>http://offensivepolitics.net/blog/?p=26</link>
		<comments>http://offensivepolitics.net/blog/?p=26#comments</comments>
		<pubDate>Thu, 18 Jun 2009 21:10:47 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Campaign Finance]]></category>
		<category><![CDATA[Open-Source]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[FEC]]></category>
		<category><![CDATA[fechell]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=26</guid>
		<description><![CDATA[Introduction fechell is a ruby library used to extract data from electronically filed Federal Election Commission reports, saving you from the hell of parsing them yourself. These reports are filed by all political action committees, presidential and House candidates; at this time Senate rules don&#8217;t require electronic disclosure. Depending on the form type and filer, [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p><strong>fechell</strong> is a ruby library used to extract data from electronically filed Federal Election Commission reports, saving you from the hell of parsing them yourself. These reports are filed by all political action committees, presidential and House candidates; at this time Senate rules don&#8217;t require electronic disclosure. Depending on the form type and filer, a report may contain summaries of receipts and disbursements, a 48-hour notice of independent spending, a statement of candidacy, and other finance or organizational notifications. In the 9 years of available data, the format of each of these reports has changed several times making it complex to uniformly parse data across multiple years and versions. The fechell library, using the <a href="http://watchdog.net/data/crawl/fec/electronic/headers">field definition files</a> provided by <a href="http://watchdog.net/">Watchdog.net</a>, allows simple pragmatic access to the full library of FEC data. To demonstrate the features of the library we will use fechell to extract all itemized donations made to the <em>Obama for America</em> campaign in 2007 and 2008 by individuals. We will then analyze these data by importing them into a database and visualizing contribution amounts aggregated by zip code over time. Plotting this data on a map will show us clusters of financial support from individuals, as well as how those clusters grow and change over time. </p>
<h2>Code books &amp; formats for electronic FEC reports</h2>
<p>The terminology detailing electronically-filed FEC reports is sometimes confusing. A report is a plain text file, usually with an extension of &#8216;fec.&#8217; Every report contains a header line detailing the FEC format version and the software used to file the report, with values separated by a version-dependent separator character. After the header, each line corresponds to either a <strong>form</strong> (ex: F1,F3P,F24), or a <strong>schedule</strong> attached to the form (Schedule A, Schedule I). The first column of every line tells the reader what type of form or schedule is contained on that line. Every form and schedule is detailed in a workbook named <b>FEC_Format_v6.4.xls</b> available from the FEC as part of their free <a href="http://www.fec.gov/elecfil/vendors.shtml">Vendor Tools</a> package. You will want to download this package and have the <b>FEC_Format_v6.4.xls</b>  (format workbook) available when processing reports with fechell. The format workbook contains one worksheet for each form or schedule, and each worksheet gives a column-by-column description of each value contained in the report. The description contains a standard name, as well as sample data and implementation notes. The fechell library uses this standard name to return values from a form/schedule to the calling program. The library will determine the correct separator character and field definition to use at run-time so all these details are hidden from the user.</p>
<p><strong>The use and sale of data from these reports for commercial purposes, especially identifying information of individuals giving to campaigns, is regulated by the FEC. See <a href="http://fec.gov/pubrec/publicrecordsoffice.shtml#using">http://fec.gov/pubrec/publicrecordsoffice.shtml#using</a>.</strong></p>
<h2>Installation</h2>
<p>the fechell library is <a href="http://github.com/offensivepolitics/fechell">hosted on github</a> and is available as a gem.</p>
<p>You will need to add GitHub to your gems source if you haven&#8217;t already done so.</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> gem sources <span style="color: #660033;">-a</span> http:<span style="color: #000000; font-weight: bold;">//</span>gems.github.com<span style="color: #000000; font-weight: bold;">/</span></pre></div></div>

<p>Now install the fechell gem.</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sudo</span> gem <span style="color: #c20cb9; font-weight: bold;">install</span> offensivepolitics-fechell</pre></div></div>

<h2>Download FEC reports</h2>
<p>Using the <a href="http://fec.gov/finance/disclosure/efile_search.shtml">FEC electronic filing search tool </a>we can find raw reports as uploaded by candidates and committees.</p>
<p>In this case we&#8217;re looking for committee id <em>C00431445</em>, as this is the primary campaign committee for <em>Obama For America</em>. Contributions are itemized on form F3P, &#8220;Summary of Receipts and Disbursements by an authorized committee (pres / vice pres).&#8221; Using those query parameters you will see close to a hundred filings but we&#8217;re only interested in 15 of them.<br />
The combined size of all <em>Obama For America</em> F3P filings is about 473 megabytes, and you can download the filings individually using the following URLs:</p>
<p><a href="http://query.nictusa.com/dcdev/posted/359390.fec">359390.fec- period 01/01/2007-03/31/2007, filed 08/22/2008 &#8211; APR QUARTERLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/359395.fec">359395.fec- period 04/01/2007-06/30/2007, filed 08/22/2008 &#8211; JUL QUARTERLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/359397.fec">359397.fec- period 07/01/2007-09/30/2007, filed 08/22/2008 &#8211; OCT QUARTERLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/360400.fec">360400.fec- period 10/01/2007-12/31/2007, filed 08/30/2008 &#8211; YEAR-END </a><br />
<a href="http://query.nictusa.com/dcdev/posted/360372.fec">360372.fec- period 01/01/2008-01/31/2008, filed 08/29/2008 &#8211; FEB MONTHLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/360401.fec">360401.fec- period 02/01/2008-02/29/2008, filed 08/30/2008 &#8211; MAR MONTHLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/362085.fec">362085.fec- period 03/01/2008-03/31/2008, filed 09/15/2008 &#8211; APR MONTHLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/358077.fec">358077.fec- period 04/01/2008-04/30/2008, filed 08/18/2008 &#8211; MAY MONTHLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/358076.fec">358076.fec- period 05/01/2008-05/31/2008, filed 08/18/2008 &#8211; JUN MONTHLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/406266.fec">406266.fec- period 06/01/2008-06/30/2008, filed 03/03/2009 &#8211; JUL MONTHLY</a><br />
<a href="http://query.nictusa.com/dcdev/posted/405793.fec">405793.fec- period 07/01/2008-07/31/2008, filed 02/27/2009 &#8211; AUG MONTHLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/405795.fec">405795.fec- period 08/01/2008-08/31/2008, filed 02/27/2009 &#8211; SEP MONTHLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/405796.fec">405796.fec- period 09/01/2008-09/30/2008, filed 02/27/2009 &#8211; OCT MONTHLY </a><br />
<a href="http://query.nictusa.com/dcdev/posted/405794.fec">405794.fec- period 10/01/2008-10/15/2008, filed 02/27/2009 &#8211; PRE-GENERAL </a><br />
<a href="http://query.nictusa.com/dcdev/posted/406271.fec">406271.fec- period 10/16/2008-11/24/2008, filed 03/03/2009 &#8211; POST-GENERAL </a></p>
<p>Save all these files into a directory.</p>
<h2>F3P fields</h2>
<p>fechell will give you access to any field in a report, but you need to know specifically which fields to look for. Using the format worksheet (FEC_Format_6.4.xls), we can easily find the correct field names and their possible values. In the previous step we downloaded 15 different filings, all type F3P &#8220;Summary of receipts and disbursements by an authorized committee (pres / vice pres).&#8221; A F3P filing contains the normal FEC header line, a summary line detailing fund raising for the current period, followed by any number of schedules. Using the Specification Requirements document (FEC_Format_6.4.pdf) that comes with the &#8216;FEC Vendor Tools&#8217; download we can quickly determine which schedules belong to which forms. In Appendix C, page 22 of Specification Requirements we see that form F3P can contain Schedules A, B, C, C1, C2 and D. Using the format workbook as a reference for each schedule, we see that Schedule A (&#8216;Sch A&#8217;) contains an itemized receipt from an individual or a committee. In our analysis, we&#8217;ll look for Schedule A filings and ignore everything else.</p>
<p>We want to visualize contribution amounts from individuals by zip code and contribution date, so we&#8217;ll extract 3 fields from each Schedule A: contribution amount, contribution date, and zip code of contributor. Schedule A can contain contributions from several types of entities, each identified by the &#8220;ENTITY TYPE&#8221; field. Looking through the &#8220;Sch A&#8221; worksheet of the format workbook we see we want column #6 (&#8220;ENTITY TYPE&#8221;), column #19 (&#8220;CONTRIBUTOR ZIP&#8221;), column #20 (&#8220;CONTRIBUTION DATE&#8221;), and column #21 (&#8220;CONTRIBUTION AMOUNT&#8221;). The names of these columns are all we&#8217;ll need to start extracting data.</p>
<h2>Extract the data</h2>
<p>Now we&#8217;re ready to load the FEC data files and extract individual contribution data.<br />
First we need to load and initialize the FEC library</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'rubygems'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'fechell'</span>
&nbsp;
h = FECHell.<span style="color:#9900CC;">new</span></pre></div></div>

<p>Next we&#8217;ll use fechell to load each file we downloaded. We will pass a filename to the <strong>process</strong> function, and fechell will return the schedule type for that line and all the values keyed by their name from the FEC worksheet for the corresponding schedule.</p>
<p>Note: You&#8217;ll need to change the Dir["./obama-fec/*.fec"] path to the directory with your downloaded FEC reports from step 2.</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'rubygems'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'fechell'</span>
&nbsp;
h = FECHell.<span style="color:#9900CC;">new</span>
<span style="color:#CC00FF; font-weight:bold;">Dir</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;./obama-fec/*.fec&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>filename<span style="color:#006600; font-weight:bold;">|</span>
  h.<span style="color:#9900CC;">process</span><span style="color:#006600; font-weight:bold;">&#40;</span>filename<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>line<span style="color:#006600; font-weight:bold;">|</span>
    schedule = line<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    values = line<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>We want to process Schedule A lines (SA) , and we only want contributions from the IND (individual) entity type . We first check the <strong>schedule</strong> value returned by fechell to make sure we&#8217;re processing a &#8220;Schedule A&#8221; (SA) line. If we aren&#8217;t, then we move to the next schedule. To check the entity type that made the contribution, we look in the <strong>values</strong> hash with the key &#8216;ENTITY TYPE&#8217; field we identified at the beginning of this step. Referring again to the FEC worksheet we see that possible values for column #6 (&#8220;ENTITY TYPE&#8221;) are &#8220;[CAN|CCM|COM|IND|ORG|PAC|PTY]&#8220;. In this case we&#8217;re only interested in contributions from individuals (IND) so we&#8217;ll skip processing the line if that value doesn&#8217;t match.</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'rubygems'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'fechell'</span>
&nbsp;
h = FECHell.<span style="color:#9900CC;">new</span>
<span style="color:#CC00FF; font-weight:bold;">Dir</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;./obama-fec/*.fec&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>filename<span style="color:#006600; font-weight:bold;">|</span>
  h.<span style="color:#9900CC;">process</span><span style="color:#006600; font-weight:bold;">&#40;</span>filename<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>line<span style="color:#006600; font-weight:bold;">|</span>
    schedule = line<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    values = line<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span>
&nbsp;
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> schedule != <span style="color:#996600;">'SA'</span>
&nbsp;
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> values<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;ENTITY TYPE&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span> != <span style="color:#996600;">&quot;IND&quot;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>Now we know we&#8217;ve got a Schedule A line for an individual so we can extract the actual data we care about. Again, the field names used below were taken from the FEC worksheet for Schedule A: &#8220;CONTRIBUTION AMOUNT&#8221;, &#8220;CONTRIBUTOR ZIP&#8221;, and &#8220;CONTRIBUTION DATE.&#8221;</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'rubygems'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'fechell'</span>
&nbsp;
h = FECHell.<span style="color:#9900CC;">new</span>
<span style="color:#CC00FF; font-weight:bold;">Dir</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;./obama-fec/*.fec&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>filename<span style="color:#006600; font-weight:bold;">|</span>
  h.<span style="color:#9900CC;">process</span><span style="color:#006600; font-weight:bold;">&#40;</span>filename<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>line<span style="color:#006600; font-weight:bold;">|</span>
    schedule = line<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    values = line<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span>
&nbsp;
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> schedule != <span style="color:#996600;">'SA'</span>
&nbsp;
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> values<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;ENTITY TYPE&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span> != <span style="color:#996600;">&quot;IND&quot;</span>
&nbsp;
    amount = values<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;CONTRIBUTION AMOUNT&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">to_f</span>
    fullzip = values<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;CONTRIBUTOR ZIP&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    date_str = values<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;CONTRIBUTION DATE&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span> 
&nbsp;
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> fullzip.<span style="color:#0000FF; font-weight:bold;">nil</span>? == <span style="color:#0000FF; font-weight:bold;">true</span>
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> date_str.<span style="color:#0000FF; font-weight:bold;">nil</span>? == <span style="color:#0000FF; font-weight:bold;">true</span>
&nbsp;
    zip5 = fullzip<span style="color:#006600; font-weight:bold;">&#91;</span>0..4<span style="color:#006600; font-weight:bold;">&#93;</span>
&nbsp;
    <span style="color:#CC0066; font-weight:bold;">puts</span> <span style="color:#996600;">&quot;On #{date_str} we received #{amount} from zipcode #{zip5}&quot;</span>
  <span style="color:#9966CC; font-weight:bold;">end</span>
<span style="color:#9966CC; font-weight:bold;">end</span></pre></div></div>

<p>Now just printing out the individual contributions is a great way to verify the library is working, but it doesn&#8217;t really do a lot for us as far as analysis is concerned. Instead of just printing information we&#8217;ll create a CSV file of all the individual contributions, suitable for import into a database. We next do some small cleanup and data verification, as well as adding some simple error handling since FEC data is notoriously poorly formatted or incomplete. This leaves us with the final version of our individual contribution extraction code:</p>

<div class="wp_syntax"><div class="code"><pre class="ruby" style="font-family:monospace;"><span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'rubygems'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'fastercsv'</span>
<span style="color:#CC0066; font-weight:bold;">require</span> <span style="color:#996600;">'fechell'</span>
&nbsp;
csv = FasterCSV.<span style="color:#CC0066; font-weight:bold;">open</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#996600;">&quot;obama-fec-SA-2007-2008.csv&quot;</span>,<span style="color:#996600;">&quot;w&quot;</span><span style="color:#006600; font-weight:bold;">&#41;</span>
csv <span style="color:#006600; font-weight:bold;">&lt;&lt;</span> <span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;contribution_date&quot;</span>,<span style="color:#996600;">&quot;amount&quot;</span>,<span style="color:#996600;">&quot;contributor_zip&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>
&nbsp;
h = FECHell.<span style="color:#9900CC;">new</span>
<span style="color:#CC00FF; font-weight:bold;">Dir</span><span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;./obama-fec/*.fec&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">each</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>filename<span style="color:#006600; font-weight:bold;">|</span>
  h.<span style="color:#9900CC;">process</span><span style="color:#006600; font-weight:bold;">&#40;</span>filename<span style="color:#006600; font-weight:bold;">&#41;</span> <span style="color:#9966CC; font-weight:bold;">do</span> <span style="color:#006600; font-weight:bold;">|</span>line<span style="color:#006600; font-weight:bold;">|</span>
    schedule = line<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">0</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    values = line<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#006666;">1</span><span style="color:#006600; font-weight:bold;">&#93;</span>
&nbsp;
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> schedule != <span style="color:#996600;">'SA'</span>
&nbsp;
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> values<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;ENTITY TYPE&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span> != <span style="color:#996600;">&quot;IND&quot;</span>
&nbsp;
    amount = values<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;CONTRIBUTION AMOUNT&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>.<span style="color:#9900CC;">to_f</span>
    fullzip = values<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;CONTRIBUTOR ZIP&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span>
    date_str = values<span style="color:#006600; font-weight:bold;">&#91;</span><span style="color:#996600;">&quot;CONTRIBUTION DATE&quot;</span><span style="color:#006600; font-weight:bold;">&#93;</span> 
&nbsp;
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> fullzip.<span style="color:#0000FF; font-weight:bold;">nil</span>? == <span style="color:#0000FF; font-weight:bold;">true</span> <span style="color:#006600; font-weight:bold;">||</span> fullzip == <span style="color:#996600;">''</span>
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> date_str.<span style="color:#0000FF; font-weight:bold;">nil</span>? == <span style="color:#0000FF; font-weight:bold;">true</span> <span style="color:#006600; font-weight:bold;">||</span> date_str == <span style="color:#996600;">''</span>
&nbsp;
    date_val = <span style="color:#CC00FF; font-weight:bold;">Date</span>.<span style="color:#9900CC;">strptime</span><span style="color:#006600; font-weight:bold;">&#40;</span>date_str,<span style="color:#996600;">'%Y%m%d'</span><span style="color:#006600; font-weight:bold;">&#41;</span>
    <span style="color:#9966CC; font-weight:bold;">next</span> <span style="color:#9966CC; font-weight:bold;">if</span> date_val.<span style="color:#0000FF; font-weight:bold;">nil</span>? == <span style="color:#0000FF; font-weight:bold;">true</span>
&nbsp;
    date_formatted = <span style="color:#996600;">&quot;#{date_val.year}-#{date_val.month.to_s.rjust(2,'0')}-#{date_val.day.to_s.rjust(2,'0')}&quot;</span>
&nbsp;
    zip5 = fullzip<span style="color:#006600; font-weight:bold;">&#91;</span>0..4<span style="color:#006600; font-weight:bold;">&#93;</span>
&nbsp;
    csv <span style="color:#006600; font-weight:bold;">&lt;&lt;</span> <span style="color:#006600; font-weight:bold;">&#91;</span>date_formatted,amount,zip5<span style="color:#006600; font-weight:bold;">&#93;</span>
&nbsp;
  <span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
<span style="color:#9966CC; font-weight:bold;">end</span>
&nbsp;
csv.<span style="color:#9900CC;">close</span></pre></div></div>

<p>Running this code on the <em>Obama for America</em> FEC files produces around 2.8 million individual itemized contributions over two years. The output file is about 67 megabytes and is ripe to be imported into a database for further analysis. Which is exactly what we do in Part 2(link).</p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/?feed=rss2&amp;p=26</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
