<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Offensive Politics</title>
	<atom:link href="http://offensivepolitics.net/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://offensivepolitics.net/blog</link>
	<description>Electoral and financial data hackery</description>
	<lastBuildDate>Wed, 04 Jan 2012 17:27:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Mapping the Iowa GOP 2012 Caucus Results</title>
		<link>http://offensivepolitics.net/blog/2012/01/mapping-the-iowa-gop-2012-caucus-results/</link>
		<comments>http://offensivepolitics.net/blog/2012/01/mapping-the-iowa-gop-2012-caucus-results/#comments</comments>
		<pubDate>Wed, 04 Jan 2012 17:27:44 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[2012]]></category>
		<category><![CDATA[Elections]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=709</guid>
		<description><![CDATA[Introduction On Tuesday January 3rd 2012 the Iowa Republican party held it&#8217;s presidential caucuses, with Mitt Romney beating Rick Santorum by 8 votes as of noon on Jan 4th. This was an exciting race with multiple lead changes and entrance polling showing many late undecideds and large gaps in candidate support by age and income. [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>On Tuesday January 3rd 2012 the Iowa Republican party held it&#8217;s presidential caucuses, with Mitt Romney beating Rick Santorum by 8 votes as of noon on Jan 4th. This was an exciting race with <a href="http://www.huffingtonpost.com/elections/state/IA/?chart=12IAPresRepPR&#038;chart_mode=new">multiple lead changes</a> and <a href="http://www.cnn.com/election/2012/primaries/epolls/ia">entrance polling</a> showing many late undecideds and large gaps in candidate support by age and income. In a nice twist for those of us who like playing with election data (and who doesn&#8217;t?), the IA GOP has published their election results using <a href="http://www.google.com/fusiontables/Home/">Google Fusion Tables</a>. This allows us to skip the usually messy data scraping and piecing together that is par for the course. </p>
<p>In this post I&#8217;ll create heat maps to illustrate candidate support geographically, as well as differential heat maps to compare results between candidates. Everything is done in <a href="http://www.r-project.org/">R</a> and all the code and data are available from <a href="http://github.com/offensivepolitics/iagop-caucus-2012">the github repository for this post</a>. Pull requests for new analysis or fixes are greatly appreciated. </p>
<h2>Data Wrangling</h2>
<p>Electoral data is usually published as some combination of PDFs, HTML files, CSV files, ESRI shapefiles, . There is usually a fair amount of work getting names and IDs to match up between formats, but not this time. Thankfully the IA GOP published their caucus results as a <a href="http://www.google.com/fusiontables/DataSource?dsrcid=2475414">Google Fusion Table</a>. This allows easy online viewing, and export to other formats. You can ignore this section if you just want to hack on the provided result files. Otherwise, to produce the data files for this post I did the following: </p>
<ol>
<li>Exported a CSV copy of the IAGOP election results with <b>File->Export</b> in the Fusion Table.</li>
<li>Exported a KML file with Iowa county outlines:
<ol>
<li>View the Iowa election map with <b>Visualize->Map</b></li>
<li>Export the KML file by clicking the &#8220;Export to KML&#8221; link on the map page</li>
</ol>
</li>
<li>Loaded the KML file into <a href="http://qgis.org/">Quantum GIS</a> and exported the layer as a Shapefile.</li>
</ol>
<p>That looks like a lot of work, and it was sort of complex, but it is miles easier than taking shape and result data from different sources, or performing all those steps in code.</p>
<h2>R code</h2>
<p>The R code makes use of several awesome packages: <a href="http://cran.r-project.org/web/packages/maptools/index.html">maptools</a>, <a href="http://cran.r-project.org/web/packages/ggplot2/index.html">ggplot2</a>, <a href="http://cran.r-project.org/web/packages/RColorBrewer/index.html">RColorBrewer</a>, and <a href="http://cran.r-project.org/web/packages/gpclib/index.html">gpclib</a>. I took code for preparing and plotting shape files w/ ggplot2 from the <a href="https://github.com/hadley/ggplot2/wiki/plotting-polygon-shapefiles">gglpot2 wiki</a>. I lifted some ggplot2 theme code from Osmo Salomaa on the <a href="http://groups.google.com/group/ggplot2/browse_thread/thread/72403c6997b79c3b?pli=1">ggplot2 mailing list</a>. This work would have been substantially more difficult without the packages and links listed above; thank you to their authors for making their work available for free. </p>
<h2>Support Heat maps</h2>
<p>First we&#8217;ll create heat maps of the vote percentage each candidate received by county in the Iowa GOP Caucuses in 2012. These maps show us strong and weak geographic areas for each candidate. The scales are identical for all candidates so comparing maps should be quite easy. In alphabetical order, and click through for a much larger version: </p>
<h4>Bachmann</h4>
<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/bachmann.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/bachmann-300x300.png" alt="" title="Bachmann County Results IA GOP 2012 Caucus" width="300" height="300" class="aligncenter size-medium wp-image-713" /></a></p>
<h4>Gingrich</h4>
<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/gingrich.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/gingrich-300x300.png" alt="" title="Gingrich County Results IA GOP 2012 Caucus" width="300" height="300" class="aligncenter size-medium wp-image-714" /></a></p>
<h4>Paul</h4>
<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/paul.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/paul-300x300.png" alt="" title="Paul County Results IA GOP 2012 Caucus" width="300" height="300" class="aligncenter size-medium wp-image-715" /></a></p>
<h4>Perry</h4>
<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/perry.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/perry-300x300.png" alt="" title="Perry County Results IA GOP 2012 Caucus" width="300" height="300" class="aligncenter size-medium wp-image-716" /></a></p>
<h4>Romney</h4>
<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/romney.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/romney-300x300.png" alt="" title="Romney County Results IA GOP 2012 Caucus" width="300" height="300" class="aligncenter size-medium wp-image-717" /></a></p>
<h4>Santorum</h4>
<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/santorum.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/santorum-300x300.png" alt="" title="Santorum County Results IA GOP 2012 Caucus" width="300" height="300" class="aligncenter size-medium wp-image-718" /></a></p>
<h4>Take away</h4>
<p>What do these maps tell us? Bachmann, who finished last, performed poorly across the entire state. Perry, who finished 5th, had some strong counties in the south west but overall performed poorly as well. Gingrich, the front runner until two weeks ago and the 4th place winner, had no really strong counties anywhere. Ron Paul finished 3rd and enjoyed several counties with large support in the the north east of Iowa, and was a contender almost everywhere. Rick Santorum had one huge win and several other large county wins in the west, while Romney had more success in the eastern side of Iowa. Romney also had more counties where he won 30-40% of the vote than Santorum did. </p>
<h2>Relative Heat maps</h2>
<p>The raw support heat maps above are OK, but they aren&#8217;t terrific for comparing two candidate&#8217;s returns. Next we&#8217;ll create relative heat maps that show the difference in support by county for each of the top 3 candidates. We&#8217;ll also create a histogram of the same values since mapping the differences can sometimes distort the overall distribution of values.</p>
<h4>Romney vs Santorum</h4>
<p>Positive values show a greater Romney percentage than Santorum, and negative values show the opposite. For example: If Romney won a county with 25%, and Santorum won the county with 50%, the Romney &#8211; Santorum value would be -25.<br />
<a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/romney-santorum.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/romney-santorum-300x300.png" alt="" title="Romney vote percentage minus Santorum vote percentage IA GOP Caucus 2012" width="300" height="300" class="aligncenter size-medium wp-image-721" /></a><a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/romney-santorum-histogram.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/romney-santorum-histogram-300x300.png" alt="" title="Romney vote percentage minus Santorum vote percentage IA GOP Caucus 2012" width="300" height="300" class="aligncenter size-medium wp-image-720" /></a></p>
<h4>Analysis: </h4>
<p>The relative heat map between the first and second place candidates in the IA GOP 2012 caucus shows us a few interesting items. Santorum cleaned up in the upper northwest of Iowa, less so in the southwest, and mixed results everywhere else. The histogram gives us a clearer picture of the overall breakdown of the results, showing us few large percentage wins for either candidate and a spike of smaller percentage wins for Santorum. The number of Santorum wins were probably offset by the size of the counties that Romney won, which isn&#8217;t shown on either of these graphs. </p>
<h4>Romney vs Paul</h4>
<p> <a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/romney-paul.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/romney-paul-300x300.png" alt="" title="Romney vote percentage minus Paul vote percentage IA GOP Caucus 2012" width="300" height="300" class="aligncenter size-medium wp-image-727" /></a><a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/romney-paul-histogram.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/romney-paul-histogram-300x300.png" alt="" title="Romney vote percentage minus Paul vote percentage IA GOP Caucus 2012" width="300" height="300" class="aligncenter size-medium wp-image-723" /></a></p>
<h4>Analysis:</h4>
<p>The first and third place candidates dont have a dramatically different map or distribution than the first and second. The map shows a few Paul strong points, but the vast majority of the counties were very close races. The histogram backs up this hypothesis, and shows larger win percentages for Romney in more counties. </p>
<h4>Santorum vs Paul</h4>
<p><a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/santorum-paul.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/santorum-paul-300x300.png" alt="" title="Santorum vote percentage minus Paul vote percentage IA GOP Caucus 2012" width="300" height="300" class="aligncenter size-medium wp-image-732" /></a><a href="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/santorum-paul-histogram.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2012/01/santorum-paul-histogram-300x300.png" alt="" title="Santorum vote percentage minus Paul vote percentage IA GOP Caucus 2012" width="300" height="300" class="aligncenter size-medium wp-image-731" /></a></p>
<h4>Analysis:</h4>
<p>Aside from the large win the northwest, it looks like Paul actually had more winning counties than Santorum. Both candidates did well, and were competitive throughout the state. </p>
<h2>Wrapup</h2>
<p>The heat maps and differential maps give us an unbiased view into how the Iowa Republican 2012 caucuses turned out. Each of the top 3 candidates was a contender in almost every county and had one or two things gone differently than anybody could have won the caucus. Please feel free to download and hack the code for this article from <a href="https://github.com/offensivepolitics/iagop-caucus-2012">my github page</a>. Thank you for reading, and please direct any questions or comments to me using the comment form below or via email at: jjh@offensivepolitics.net. </p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/2012/01/mapping-the-iowa-gop-2012-caucus-results/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Exploring Your Voter File with R</title>
		<link>http://offensivepolitics.net/blog/2011/12/exploring-your-voter-file-with-r/</link>
		<comments>http://offensivepolitics.net/blog/2011/12/exploring-your-voter-file-with-r/#comments</comments>
		<pubDate>Mon, 12 Dec 2011 13:39:33 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Elections]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=671</guid>
		<description><![CDATA[We perform a voter file analysis for Wake County, NC using R.]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>In this post I&#8217;ll perform an analysis of the voter file for Wake County, NC. I&#8217;ll follow the same processes as a political consultant, but I&#8217;ll be using R instead of traditional tools. The data and code for this post are all available on my <a href="https://github.com/offensivepolitics/wake-county-voter-file-analysis" >GitHub repository</a>, so a reader should be able to follow along and expand on this analysis. </p>
<h2>What is a voter file?</h2>
<p>A voter file is a list of every registered voter for some political area, like a county or a congressional district. The voter file is purchased from state parties, 3rd party vendors, PACs, or municipal offices. The file usually contains the name, address, and voting history of every registered voter. Campaigns can purchase extra data (&#8220;appends&#8221;) like phone numbers, email addresses, demographic information, or organization membership information. All these data are used to sort, group, and prioritize contact of a voter.</p>
<h2>Who uses a voter file?</h2>
<p>The voter file is used by candidates for political office or issue campaigns to understand their target constituents. An good analysis of the voter file will reveal strategic demographic and turnout information that may otherwise be hidden. Armed with this information a campaign will assemble a voter contact plan and build turnout projections, both of which are essential campaign processes. A candidate without a strong understanding of demographic and turnout information for his election will most likely waste campaign resources talking to the wrong people.</p>
<h2>How would this normally be done? </h2>
<ul>
<li>A small campaign will probably use MS Excel or Access to perform a precinct analysis and build lists and counts.</li>
<li>For larger campaigns, the CRM system (<a href="http://www.ngpvan.com/">NGPVAN</a>, <a href="http://www.aristotle.com/">Aristotle</a>) all have cross-tabs, lists, and counting, but are primarily used for contact and fundraising compliance.</li>
<li>Very sophisticated campaigns will use something like the Q-tool from <a href="http://catalist.us/">Catalist</a>. This is a voter analysis tool providing data-mining and modeling capabilities, along with the standard counting. Q-Tool is extremely impressive.</li>
</ul>
<h2>Why use R?</h2>
<p>From the <a href="http://www.r-project.org" title="R-Project Home Page">R-project home page</a>: R is a free software environment for statistical computing and graphics. The R programming language allows a wide range of analysis and visualization not available in traditional political tools. Data manipulation is easier on the messy and disjoint data we deal with in political analysis. The visualization tools (like ggplot2 and lattice) let us easily generate graphics suited to our exact data. Finally, R has excellent support for basic politics statistics like clustering and regression analysis, to say nothing of more advanced statistical tools multilevel modeling and simulation. </p>
<h2>The Voter File</h2>
<p>We will be exploring the voter file for Wake County, NC. Besides being free and updated regularly, this voter file has several unique features including voter gender, party registration, and absentee voting information. This level of data is almost unheard of in free lists, so our analysis will be very realistic. Wake County has both a strong absentee and vote by mail program, as well as early voting and same day registration. Wake County also allows voters not affiliated with a major party to vote in primary elections, but for only one primary per election season. The Wake County Board of Elections is very far above the average with elections administration and data disbursement in all areas save one. Why would they distribute the voter file as a self-extracting zip archive? </p>
<h2>The toolkit</h2>
<p>I&#8217;ll be using the Wake Count voter file downloaded from <a href="http://msweb03.co.wake.nc.us/bordelec/Waves/WavesDownload.asp">WakeGOV.com</a> on Nov 21, 2011. I&#8217;ll be using 2.14 of <a href="http://www.r-project.org/">R</a>, along with the following packages: plyr, ggplot2, gmodels, and RColorBrewer. All code and a slightly trimmed-down voter file can be downloaded from my <a href="https://github.com/offensivepolitics/wake-county-voter-file-analysis" >GitHub repository</a>.</p>
<h2>Cross tabs</h2>
<p>The initial analysis will focus on understanding the demographic makeup and voting history of registered voters in Wake County. We&#8217;ll perform simple counts (cross-tabs) on different segments of the file to better understand the demographic makeup and voting history of registered voters in Wake County.</p>
<p><strong>Voting Status</strong></p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">CrossTable<span style="color: #080;">&#40;</span>vf$status,prop.<span style="">c</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>,prop.<span style="">chisq</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>,<span style="color: #0000FF; font-weight: bold;">format</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;SPSS&quot;</span>,max.<span style="">width</span><span style="color: #080;">=</span><span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span></pre></div></div>

<pre>
Total Observations in Table:  595741 

          |   Active  | Inactive  |
          |-----------|-----------|
          |   533011  |    62730  |
          |   89.470% |   10.530% |
          |-----------|-----------|
</pre>
<p>Our first cross-tab tells us that 10.5% of the voters on our list are inactive. Wake County considers a voter inactive is mail has been returned from this address. Determining voter status ahead of time is usually an expensive or impossible task but Wake County has helpfully done this for us. Without this information a campaign would waste money sending mail or door knocking ad the wrong address.</p>
<p><strong>Party Affiliation</strong></p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">CrossTable<span style="color: #080;">&#40;</span>vf$party,prop.<span style="">c</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>,prop.<span style="">chisq</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>,<span style="color: #0000FF; font-weight: bold;">format</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;SPSS&quot;</span>,max.<span style="">width</span><span style="color: #080;">=</span><span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span></pre></div></div>

<pre>
          |      DEM  |      LIB  |      REP  |      UNA  |
          |-----------|-----------|-----------|-----------|
          |   246641  |     1577  |   178676  |   168847  |
          |   41.401% |    0.265% |   29.992% |   28.342% |
          |-----------|-----------|-----------|-----------|
</pre>
<p>The party breakdown in Wake County shows a 12 point Democratic registration advantage, which is a solid lead. The nearly 1/3rd of voters who aren&#8217;t affiliated with a party could make for some very close elections as each major party aggressively courts the unaffiliated.
</p>
<p><strong>Gender</strong></p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">CrossTable<span style="color: #080;">&#40;</span>vf$gender,prop.<span style="">c</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>,prop.<span style="">chisq</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>,<span style="color: #0000FF; font-weight: bold;">format</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;SPSS&quot;</span>,max.<span style="">width</span><span style="color: #080;">=</span><span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span></pre></div></div>

<pre>
          | Female    | Male      |  Unknown  |
          |-----------|-----------|-----------|
          |   316958  |   273696  |     5087  |
          |   53.204% |   45.942% |    0.854% |
          |-----------|-----------|-----------|
</pre>
<p>The gender cross-tab tells is a majority of Wake County voters are female, and by a healthy 15% margin. This information will inform every level of communication by the campaign.</p>
<p><strong>Age Group</strong></p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">CrossTable<span style="color: #080;">&#40;</span>vf$age.<span style="">group</span>,prop.<span style="">c</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>,prop.<span style="">chisq</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>,<span style="color: #0000FF; font-weight: bold;">format</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;SPSS&quot;</span>,max.<span style="">width</span><span style="color: #080;">=</span><span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span></pre></div></div>

<pre>

Total Observations in Table:  595741 

          |   [17,30) |   [30,40) |   [40,50) |   [50,60) |   [60,70) |
          |-----------|-----------|-----------|-----------|-----------|
          |   114391  |   122686  |   132329  |   108995  |    67758  |
          |   19.201% |   20.594% |   22.213% |   18.296% |   11.374% |
          |-----------|-----------|-----------|-----------|-----------|
          |-----------|-----------|-----------|-----------|-----------|
          |   [70,80) |   [80,90) |  [90,100) | [100,110) | [110,120) |
          |-----------|-----------|-----------|-----------|-----------|
          |    30337  |    15469  |     3599  |      172  |        5  |
          |    5.092% |    2.597% |    0.604% |    0.029% |    0.001% |
          |-----------|-----------|-----------|-----------|-----------|
</pre>
<p>The Wake County file has age information, which we've binned into roughly 10-year sized buckets. As much as gender, age of registered voters will play role in determing the policy goals and communication method used by a campaign. For example: a precinct with many older voters is a good candidate for an afternoon or early evening canvas since many of your targets will be home. But a precinct on a college campus or otherwise full of younger voters may be better contacted through alternative forms like social media or email.</p>
<p><strong>Party affiliation, by Gender</strong></p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">CrossTable<span style="color: #080;">&#40;</span>vf$gender,vf$party,prop.<span style="">c</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>,prop.<span style="">chisq</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">F</span>,<span style="color: #0000FF; font-weight: bold;">format</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;SPSS&quot;</span>,max.<span style="">width</span><span style="color: #080;">=</span><span style="color: #ff0000;">10</span><span style="color: #080;">&#41;</span></pre></div></div>

<pre>
|-------------------------|
|                   Count |
|             Row Percent |
|           Total Percent |
|-------------------------|

             | vf$party
   vf$gender |      DEM  |      LIB  |      REP  |      UNA  | Row Total |
-------------|-----------|-----------|-----------|-----------|-----------|
           F |   146512  |      627  |    88295  |    81524  |   316958  |
             |   46.224% |    0.198% |   27.857% |   25.721% |   53.204% |
             |   24.593% |    0.105% |   14.821% |   13.684% |           |
-------------|-----------|-----------|-----------|-----------|-----------|
           M |    98497  |      932  |    89645  |    84622  |   273696  |
             |   35.988% |    0.341% |   32.753% |   30.918% |   45.942% |
             |   16.534% |    0.156% |   15.048% |   14.204% |           |
-------------|-----------|-----------|-----------|-----------|-----------|
           U |     1632  |       18  |      736  |     2701  |     5087  |
             |   32.082% |    0.354% |   14.468% |   53.096% |    0.854% |
             |    0.274% |    0.003% |    0.124% |    0.453% |           |
-------------|-----------|-----------|-----------|-----------|-----------|
Column Total |   246641  |     1577  |   178676  |   168847  |   595741  |
-------------|-----------|-----------|-----------|-----------|-----------|
</pre>
<p>The first two-way cross-tab is somewhat intimidating at first but is easy enough to read with a key. Gender is along the left side, and party affiliation is along the top. The three numbers in each box represent the raw count of voters, the percentage of voters in that row, and the percentage of voters overall. The first box tells us that there are 146,512 registered Female Democrats; that Democrats make up 46% of female voters; Female democrats are 24.5% of the total electorate. </p>
<p>This crosstab is full of useful information that would be invaluable for campaign planning in Wake County: The Democratic registration advantage is 19 points for women, but only 3 points for men. Me are more likely to be unaffiliated than women, indeed the unaffiliated voters almost make up a 3rd party by themselves. The Wake County Board of Elections says unaffiliated voters may still vote in a partisan primary, but only one party's primary per election. This quirk will make the unaffiliated voters huge targets of opportunity during primary elections. </p>
<h2>Graphics</h2>
<p>As we've shown Cross-tabs are very powerful tools but can get quickly cause information overload. This is where data visualizations (fancy word for charts) come in. We'll use the powerful ggplot2 library to do some quick visualizations on the Wake County voter file. There are many other R packages for data visualization, but the ggplot2 library works very well for our purposes.<br />
<strong>Voter Registration, by Age Group</strong></p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">qplot<span style="color: #080;">&#40;</span>age.<span style="">group</span>,<span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>vf,type<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;histogram&quot;</span>,main<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Wake County Registered Voters, by Age Group &quot;</span><span style="color: #080;">&#41;</span></pre></div></div>

<p><div id="attachment_688" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/voters_age_group_nov2011.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/voters_age_group_nov2011-300x268.png" alt="" title="Wake County Registration By Age Group" width="300" height="268" class="size-medium wp-image-688" /></a><p class="wp-caption-text">Wake County Registration By Age Group<br />Click To Enlarge</p></div><br />
This histogram represents the same data as the Age Group cross-tab from above, but its much easier to compare age groups and understand the overall distribution of voters in Wake County. </p>
<p><strong>2010 Turnout by, Age Group</strong></p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"> qplot<span style="color: #080;">&#40;</span>age.<span style="">group</span>,<span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>vf<span style="color: #080;">&#91;</span>vf$regdate <span style="color: #080;">&lt;=</span> <span style="color: #ff0000;">&quot;2010-11-04&quot;</span>,<span style="color: #080;">&#93;</span>,type<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;histogram&quot;</span>,fill<span style="color: #080;">=</span>g2010.<span style="">v</span>,position<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;dodge&quot;</span>,main<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Wake Count 2010 Turnout, by Age Group&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span> scale_fill_brewer<span style="color: #080;">&#40;</span>name<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Voted 2010&quot;</span>, pal<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Set1&quot;</span><span style="color: #080;">&#41;</span></pre></div></div>

<p><div id="attachment_690" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/2010_turnout_age_group_nov2011.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/2010_turnout_age_group_nov2011-300x213.png" alt="" title="Wake County 2010 General Turnout by Age Group" width="300" height="213" class="size-medium wp-image-690" /></a><p class="wp-caption-text">Wake County 2010 General Turnout by Age Group<br />Click To Enlarge</p></div><br />
In the previous plot we saw what looked like pretty even registration among voters, with 40-49 having the highest numbers. But this chart shows us turnout between the age groups in 2010 is very divergent. Turnout for the 17-29 age group was dismal, 30-39 slightly better. For voters older than forty and younger than eighty, turnout was always greater than 50%. A candidate that needs to win the youth vote will have their work cut out for them in Wake County, if 2010 is any indication of the norm. </p>
<p><strong>Turnout in 2010 vs 2008, by Gender</strong></p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">gender.<span style="">turnout</span> <span style="color: #080;">&lt;-</span> ddply<span style="color: #080;">&#40;</span>vf,<span style="color: #ff0000;">&quot;gender&quot;</span>,<span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>total<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span>,turnout<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>x$g2010.<span style="">v</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>x$g2008.<span style="">v</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>,election<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;2010&quot;</span>,<span style="color: #ff0000;">&quot;2008&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span>
qplot<span style="color: #080;">&#40;</span>gender,turnout <span style="color: #080;">/</span> total, <span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>gender.<span style="">turnout</span>, geom<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;histogram&quot;</span>, stat<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;identity&quot;</span>,fill<span style="color: #080;">=</span>election,position<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;dodge&quot;</span>, main<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Turnout in 2010 vs 2008 Wake County, by Gender&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span> scale_fill_brewer<span style="color: #080;">&#40;</span>name<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Election cycle&quot;</span>,pal<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Set1&quot;</span><span style="color: #080;">&#41;</span></pre></div></div>

<pre>
  gender  total turnout election
1      F 316958  146812     2010
2      F 316958  227910     2008
3      M 273696  127999     2010
4      M 273696  188888     2008
5      U   5087    1470     2010
6      U   5087    2157     2008
</pre>
<div id="attachment_691" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/2010_2008_turnout_by_gender.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/2010_2008_turnout_by_gender-300x213.png" alt="" title="Wake County Turnout 2008 and 2010 by Gender" width="300" height="213" class="size-medium wp-image-691" /></a><p class="wp-caption-text">Wake County Turnout 2008 and 2010 by Gender<br />Click To Enlarge</p></div>
<p>Previously we saw women were much more likely to be Democrats than Republicans, but didn't tell us anything about their turnout propensity. This chart shows us turnout percentages by gender, for the 2008 and 2010 general elections. We see turnout was much higher for both men and women in 2008 than 2010. Also that women turned out at a higher rate than men in 2008, but closer to par in 2010. While both genders turned out at around 45% in 2010, almost 40,000 more women turned out than men due to the registration disparity. </p>
<p><strong>Change in turnout by precinct from 2008 to 2010</strong><br />
Until now we've been comparing very simple counts that could have been easily been done with cross-tabs. The next chart plots turnout percentage by precinct from 2008 against turnout percentage in 2010. With 200 precincts we would be hard pressed to easily visualize these data without some sort of chart.</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">precinct.<span style="">turnout</span> <span style="color: #080;">&lt;-</span> ddply<span style="color: #080;">&#40;</span>vf, <span style="color: #ff0000;">&quot;precinct&quot;</span>, summarize, turnout2010<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>g2010.<span style="">v</span><span style="color: #080;">&#41;</span> <span style="color: #080;">/</span> <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>g2010.<span style="">v</span><span style="color: #080;">&#41;</span>, turnout2008<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>g2008.<span style="">v</span><span style="color: #080;">&#41;</span> <span style="color: #080;">/</span> <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>g2010.<span style="">v</span><span style="color: #080;">&#41;</span>,reg2010<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>g2008.<span style="">v</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#41;</span>
&nbsp;
qplot<span style="color: #080;">&#40;</span>turnout2010, turnout2008, <span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>precinct.<span style="">turnout</span>,xlim<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span>.1,<span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>,ylim<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span>.1,<span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>,main<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Turnout percentage 2008 to 2010, by Precinct&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span> geom_abline<span style="color: #080;">&#40;</span>intercept<span style="color: #080;">=</span><span style="color: #ff0000;">0</span>,slope<span style="color: #080;">=</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span></pre></div></div>

<div id="attachment_692" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/turnout_2010_2008_precinct.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/turnout_2010_2008_precinct-300x300.png" alt="" title="Wake County Turnout Percentage 2008 and 2010 by Precinct" width="300" height="300" class="size-medium wp-image-692" /></a><p class="wp-caption-text">Wake County Turnout Percentage 2008 and 2010 by Precinct<br />Click To Enlarge</p></div>
<p>This chart shows turnout percentage in 2008 along the vertical axis, turnout percentage for 2010 along the horizontal axis, and a line representing equal turnout in both. Points above the line had higher turnout in 2008, while points below the line had lower turnout. None of the points are below the line, meaning all precincts turned out lower in 2010 than 2008. We knew this, but we didn't know if the decrease was inform across precincts. Now we do know the decrease wasn't uniform, and certainly there is a relationship between turnout in 2008 and turnout in 2010.<br />
Given this type of relationship we will fit a simple linear regression against the data and see if we can quantify change further:
</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># fit the turnout with a simple linear model</span>
<span style="color: #080;">&gt;</span> <span style="color: #0000FF; font-weight: bold;">lm</span><span style="color: #080;">&#40;</span>turnout2008~turnout2010,<span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>precinct.<span style="">turnout</span><span style="color: #080;">&#41;</span>
&nbsp;
Call<span style="color: #080;">:</span>
<span style="color: #0000FF; font-weight: bold;">lm</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">formula</span> <span style="color: #080;">=</span> turnout2008 ~ turnout2010, <span style="color: #0000FF; font-weight: bold;">data</span> <span style="color: #080;">=</span> precinct.<span style="">turnout</span><span style="color: #080;">&#41;</span>
&nbsp;
Coefficients<span style="color: #080;">:</span>
<span style="color: #080;">&#40;</span>Intercept<span style="color: #080;">&#41;</span>  turnout2010  
     <span style="color: #ff0000;">0.4124</span>       <span style="color: #ff0000;">0.6216</span>  
<span style="color: #228B22;"># summary gives us an rsqured of 0.84 </span>
&nbsp;
<span style="color: #228B22;"># plot the same graph w/ a line of best fit</span>
qplot<span style="color: #080;">&#40;</span>turnout2010, turnout2008, <span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>precinct.<span style="">turnout</span>,xlim<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span>.1,<span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>,ylim<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span>.1,<span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>,main<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Turnout percentage 2008 to 2010, by Precinct&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span> geom_abline<span style="color: #080;">&#40;</span>intercept<span style="color: #080;">=</span><span style="color: #ff0000;">0</span>,slope<span style="color: #080;">=</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span> geom_abline<span style="color: #080;">&#40;</span>intercept<span style="color: #080;">=</span><span style="color: #ff0000;">0.4124</span>,slope<span style="color: #080;">=</span>.62<span style="color: #080;">&#41;</span></pre></div></div>

<div id="attachment_693" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/turnout_2010_2008_precinct_bestfit.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/turnout_2010_2008_precinct_bestfit-300x300.png" alt="" title="Wake County Turnout Percentage 2008 and 2010 by Precinct (2)" width="300" height="300" class="size-medium wp-image-693" /></a><p class="wp-caption-text">Wake County Turnout Percentage 2008 and 2010 by Precinct (2)<br />Click To Enlarge</p></div>
<p>In the R code above we fit a simple linear regression to the precinct turnout data, which gave us an intercept and single regression coefficient. We used these values to plot a line of best fit on the same graph. From this chart we see that the change in turnout from 2008 to 2010 by precinct was almost uniform for most data. </p>
<p><strong>Democratic registration % and precinct size by 2010 turnout</strong><br />
This final graph is a complicated one - We'll look at Democratic voter registration and turnout in 2010 by precinct. A campaign would use this chart to find precincts with high democratic registration but low 2010 turnout. A precinct with these characteristics will be near the top on a target list for a Democratic candidate to canvas. Finding precincts like these is a valuable part of the precinct analysis process that drives a campaign plan.</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># use ddply to summarize registration and turnout by precinct</span>
dem.<span style="">reg</span>.<span style="">prec</span> <span style="color: #080;">&lt;-</span> ddply<span style="color: #080;">&#40;</span>vf, <span style="color: #ff0000;">&quot;precinct&quot;</span>, summarize, registered<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>status<span style="color: #080;">&#41;</span>,turnout2010<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>g2010.<span style="">v</span><span style="color: #080;">&#41;</span>,dem.<span style="">pct</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>party <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;DEM&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">/</span> <span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span>party<span style="color: #080;">&#41;</span> <span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># now plot it</span>
qplot<span style="color: #080;">&#40;</span>dem.<span style="">pct</span>,turnout2010<span style="color: #080;">/</span>registered,<span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>dem.<span style="">reg</span>.<span style="">prec</span>,alpha<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">I</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0.8</span><span style="color: #080;">&#41;</span>, main<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Democratic registration percentage and 2010 turnout<span style="color: #000099; font-weight: bold;">\n</span> by precinct&quot;</span>,xlab<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Democratic Registration%&quot;</span>, ylab<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Turnout 2010 (All Parties)&quot;</span>,xlim<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>, ylim<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">0</span>,<span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span></pre></div></div>

<div id="attachment_694" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/turnout_2010_democratic_registration.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/12/turnout_2010_democratic_registration-300x300.png" alt="" title="Wake County 2010 Turnout by Democratic Registration Percentage" width="300" height="300" class="size-medium wp-image-694" /></a><p class="wp-caption-text">Wake County 2010 Turnout by Democratic Registration Percentage<br />Click To Enlarge</p></div>
<p>
Points in the upper left side of the graph represent precincts with high turnout and low Democratic registration - those are precincts we'll want to ignore since they most likely voted for our opponents. Precincts in the lower right side mean low turnout and high Democratic registration - those are the precincts we'd want to target most heavily for turnout efforts in 2012. Precincts with turnout below 50% with a Democratic registration percentage of 50 or greater will probably be next on the list for canvassing efforts. </p>
<h2>Wrapping up</h2>
<p>Thank you for reading this simple overview on performing a voter file analysis using R. The Wake County voter file is full of interesting information, we've just barely scratched the surface and I encourage the curious user to explore the file themselves. Thanks to the Wake County Board of Elections for keeping such a high quality file free and up to date. Please don't hesitate to leave any feedback in the comments below, via <a href="mailto: jjh@offensivepolitics.net>Email</a>, or <a href="http://twitter.com/offpol">Follow me on twitter.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/2011/12/exploring-your-voter-file-with-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Candidate Debt in CA-36 runoff</title>
		<link>http://offensivepolitics.net/blog/2011/07/candidate-debt-in-ca-36-runoff/</link>
		<comments>http://offensivepolitics.net/blog/2011/07/candidate-debt-in-ca-36-runoff/#comments</comments>
		<pubDate>Tue, 12 Jul 2011 18:49:03 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Campaign Finance]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[US House]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=660</guid>
		<description><![CDATA[Looking at candidate debt in the CA-36 runoff election.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been digging into recently filed FEC reports and stumbled across something pretty interesting involving the Republican candidate Craig Huey in the runoff election for California&#8217;s 36th congressional district. I&#8217;ve found that Huey has donated or loaned his campaign $883,000 since 3/24/2011. As of his latest F3 filing (coverage through 6/22/2001) the candidate had contributed or loaned his campaign $795,000, and the total receipts at that point were only $840,514.93. That means the candidate has contributed or loaned 95% of the total funds raised by his campaign. The candidate contributed another $88,000 on 7/7, bringing his overall total to $883,000. </p>
<p>I was sure I double counted, or the candidate misfiled something, but after looking at every report I believe the number is correct. The filing&#8217;s for this candidate are a mess; several loan amounts are reported twice, and every single Form3 filing had to be amended, four times in his pre-special filing. All the fec forms for candidate Huey can be found <a href="http://query.nictusa.com/cgi-bin/dcdev/forms/C00494468/">here</a>, and forms with loans by or contributions from the candidate can be found below.</p>
<p>I looked at two different types of filings to find this information: I used the Schedule C line 10 documents from Form 3 (notice of receipts and disbursements) quarterly, pre-special, and pre-runoff filings. I also used the Form 6 (48-hour notice) filings that have been piling up since 6/24. </p>
<p><iframe src="https://spreadsheets.google.com/spreadsheet/pub?hl=en_US&#038;hl=en_US&#038;key=0AohK9ibM1Pe1dFpLRzZ4b0VWSjNHX0pNemkweW1FclE&#038;output=html" width="100%" height="240px" ></iframe><br />
<br />
Huey certainly isn&#8217;t the only candidate to finance his own campaign, but I think this might be a record for a runoff election in the US House. Anybody have any information on that?</p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/2011/07/candidate-debt-in-ca-36-runoff/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FECHell 0.2.0 Released</title>
		<link>http://offensivepolitics.net/blog/2011/07/fechell-0-2-0-released/</link>
		<comments>http://offensivepolitics.net/blog/2011/07/fechell-0-2-0-released/#comments</comments>
		<pubDate>Thu, 07 Jul 2011 16:25:17 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Campaign Finance]]></category>
		<category><![CDATA[Open-Source]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=653</guid>
		<description><![CDATA[My FEC report parsing gem has been upgraded to version 0.2.0, and now supports the latest FEC filings (7.0). ]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve finally got around to updated my <a href="http://offensivepolitics.net/fechell/">FEC report parsing gem FECHell</a> on the <a href="http://github.com/offensivepolitics/fechell">FECHell github page</a>. The FEC had released a mostly cosmetic update to their <a href="http://fec.gov/elecfil/vendors.shtml">FEC Vendor Tools</a> a few months ago, but I&#8217;ve been slacking. The latest FECHell gem (0.2.0) supports everything from 3.00 to the very latest 7.0. I&#8217;m going to have to flesh out some of the unit tests after this next filing deadline since I haven&#8217;t found examples of some of the schedules (SC1) yet. This version of FECHell still uses FasterCSV, so it is tied to Ruby 1.8.X. I&#8217;m hoping to find a workaround soon to enable 1.9 support. Comments and questions or pull requests are always appreciated.</p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/2011/07/fechell-0-2-0-released/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Donor analysis in R &#8211; Smith for Congress</title>
		<link>http://offensivepolitics.net/blog/2011/06/donor-analysis-in-r-smith-for-congress/</link>
		<comments>http://offensivepolitics.net/blog/2011/06/donor-analysis-in-r-smith-for-congress/#comments</comments>
		<pubDate>Mon, 13 Jun 2011 20:56:18 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Campaign Finance]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Smith for Congress]]></category>
		<category><![CDATA[US House]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=602</guid>
		<description><![CDATA[In a previous post I introduced the Smith for Congress data set. The data is 49k contributions made by individuals to a congressional campaign for the 2006-2010 electoral cycles. Smith for Congress is not the name of the actual campaign. Individual contributions are not required to be disclosed by a campaign unless the individual donates [...]]]></description>
			<content:encoded><![CDATA[<p>In a <a href="http://offensivepolitics.net/blog/2011/06/individual-contributions-in-us-house-elections-smith-for-congress/">previous post</a> I introduced the Smith for Congress data set.  The data is 49k contributions made by individuals to a congressional campaign for the 2006-2010 electoral cycles. Smith for Congress is not the name of the actual campaign. </p>
<p>Individual contributions are not required to be disclosed by a campaign unless the individual donates more than $200 during a single electoral cycle. The Smith for Congress campaign has, for their own reasons, published every individual contribution. This disclosure allows us an unprecedented look into how a modern campaign raises money. I&#8217;ve collected and scrubbed these contributions and published them for research use. In this post I will perform a detailed donor analysis on with R to better understand how the Smith for Congress campaign financed its 2010 election. Full code and graphs can be found on the <a href="https://github.com/offensivepolitics/simple-analysis">simple-analysis github repository</a> for this post:</p>
<h2>Prepartion</h2>
<p>We need to download the data and load it into R. The latest data can always be downloaded from: <a href="http://offensivepolitics.net/data/smithforcongress-latest.zip">Smith for Congress Latest</a></p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># latest smith for congress data as of this writing is March 23 2011.</span>
cd <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">read.<span style="">csv</span></span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;smithforcongress-03232011.csv&quot;</span><span style="color: #080;">&#41;</span>
<span style="color: #228B22;">#subset the data to just the 2010 cycle</span>
cd0 <span style="color: #080;">&lt;-</span> cd<span style="color: #080;">&#91;</span>cd$cycle <span style="color: #080;">==</span> <span style="color: #ff0000;">2010</span>,<span style="color: #080;">&#93;</span>
<span style="color: #228B22;"># clean up a date variable, and drop amounts &lt; $1. </span>
cd$contribution_date <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">as.<span style="">Date</span></span><span style="color: #080;">&#40;</span>cd$contribution_date,<span style="color: #0000FF; font-weight: bold;">format</span><span style="color: #080;">=</span><span style="color: #ff0000;">&quot;%m/%d/%Y&quot;</span><span style="color: #080;">&#41;</span>
cd0 <span style="color: #080;">&lt;-</span> cd0<span style="color: #080;">&#91;</span><span style="color: #080;">-</span><span style="color: #0000FF; font-weight: bold;">which</span><span style="color: #080;">&#40;</span>cd0$amount <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span></pre></div></div>

<p>Data for the 2010 electoral cycle consists of 11,721 contributions made by 6949 individuals, totaling over $770,000. Here is a sample: </p>
<table border=0 class=dataframe>
<tbody>
<tr class= firstline >
<th>personid  </th>
<th>amount  </th>
<th>ctd_aggregate  </th>
<th>contribution_date  </th>
<th>cycle</th>
</tr>
<tr>
<td class=firstcolumn>9zvlnzw1qj9bvq7k1x47v486a</td>
<td class=cellinside>10</td>
<td class=cellinside>20</td>
<td class=cellinside>2009-04-01</td>
<td class=cellinside>2010</td>
</tr>
<tr>
<td class=firstcolumn>iy8xcopedihv9vwqpg3iwmal
</td>
<td class=cellinside>15
</td>
<td class=cellinside>35
</td>
<td class=cellinside>2009-04-01
</td>
<td class=cellinside>2010
</td>
</tr>
<tr>
<td class=firstcolumn>1f0lct995ckygk6y4vaxk2q44
</td>
<td class=cellinside>20
</td>
<td class=cellinside>20
</td>
<td class=cellinside>2009-04-01
</td>
<td class=cellinside>2010
</td>
</tr>
<tr>
<td class=firstcolumn>bf2d43vdjdg07pgfmph6ghy7o
</td>
<td class=cellinside>20
</td>
<td class=cellinside>20
</td>
<td class=cellinside>2009-04-01
</td>
<td class=cellinside>2010
</td>
</tr>
<tr>
<td class=firstcolumn>7sj05z74r8y10fcctvx4a38pn
</td>
<td class=cellinside>20
</td>
<td class=cellinside>20
</td>
<td class=cellinside>2009-04-01
</td>
<td class=cellinside>2010
</td>
</tr>
</tbody>
</table>
</td>
</table>
<h2>Data Summary</h2>
<p>Since the number of individual donors (6,949) is so much lower than the number of contributions (11,717) we can guess a good portion of those donors gave multiple times. The long-form contribution data is somewhat difficult to work when looking at multiple contributions from the same person. We&#8217;ll generate a summary data frame to help with our analysis. The following variables will be captured per individual donor: </p>
<ul>
<li>Date of first contribution</li>
<li>The total value of all contributions by this individual</li>
<li>The total number of contributions by this individual</li>
<li>The amount of the first three contributions. Blank or NA if they have made less than 3 contributions.</li>
<li>The difference in time for the first three contributions. Blank or NA if they have made less than 3 contributions.</li>
</ul>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">summarize.<span style="">contributions</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
  xo <span style="color: #080;">&lt;-</span> x<span style="color: #080;">&#91;</span><span style="color: #0000FF; font-weight: bold;">order</span><span style="color: #080;">&#40;</span>x$contribution_date<span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span>
  dtx <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">as.<span style="">integer</span></span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">diff</span><span style="color: #080;">&#40;</span>x$contribution_date<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
&nbsp;
  <span style="color: #0000FF; font-weight: bold;">return</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>
		first.<span style="">contribution</span><span style="color: #080;">=</span>xo$contribution_date<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>, 
		num.<span style="">contributions</span> <span style="color: #080;">=</span> <span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>xo<span style="color: #080;">&#41;</span>,
		dt1<span style="color: #080;">=</span>dtx<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>,
		dt2<span style="color: #080;">=</span>dtx<span style="color: #080;">&#91;</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span>,
		dt3<span style="color: #080;">=</span>dtx<span style="color: #080;">&#91;</span><span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span>,
		am1<span style="color: #080;">=</span>xo$amount<span style="color: #080;">&#91;</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#93;</span>,
		am2<span style="color: #080;">=</span>xo$amount<span style="color: #080;">&#91;</span><span style="color: #ff0000;">2</span><span style="color: #080;">&#93;</span>,
		am3<span style="color: #080;">=</span>xo$amount<span style="color: #080;">&#91;</span><span style="color: #ff0000;">3</span><span style="color: #080;">&#93;</span>,
		total.<span style="">value</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>x$amount<span style="color: #080;">&#41;</span>
	<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span>
cd0s <span style="color: #080;">&lt;-</span> ddply<span style="color: #080;">&#40;</span>cd0, <span style="color: #ff0000;">&quot;personid&quot;</span>, summarize.<span style="">contributions</span><span style="color: #080;">&#41;</span></pre></div></div>

<p>Now the <strong>cd0s</strong> data frame holds our summary table, which looks like this: </p>
<table border=0 class=dataframe>
<tbody>
<tr class= firstline >
<th>personid  </th>
<th>first.contribution  </th>
<th>num.contributions  </th>
<th>dt1  </th>
<th>dt2  </th>
<th>dt3  </th>
<th>am1  </th>
<th>am2  </th>
<th>am3  </th>
<th>total.value</th>
</tr>
<tr>
<td class=cellinside>1023ryaqqbvz76kh3yq0r2ngq
</td>
<td class=cellinside>2010-10-18
</td>
<td class=cellinside>1
</td>
<td class=cellinside> NA
</td>
<td class=cellinside> NA
</td>
<td class=cellinside>NA
</td>
<td class=cellinside>  25
</td>
<td class=cellinside>NA
</td>
<td class=cellinside> NA
</td>
<td class=cellinside>  25
</td>
</tr>
<tr>
<td class=cellinside>1036lg58hd4skceuyqrr2peb4
</td>
<td class=cellinside>2010-03-25
</td>
<td class=cellinside>2
</td>
<td class=cellinside>166
</td>
<td class=cellinside> NA
</td>
<td class=cellinside> NA
</td>
<td class=cellinside>  35
</td>
<td class=cellinside>25
</td>
<td class=cellinside> NA
</td>
<td class=cellinside>  60
</td>
</tr>
<tr>
<td class=cellinside>106f366ysq6xe9ci731wejh0k
</td>
<td class=cellinside>2009-12-11
</td>
<td class=cellinside>4
</td>
<td class=cellinside> 91
</td>
<td class=cellinside>185
</td>
<td class=cellinside>63
</td>
<td class=cellinside>  50
</td>
<td class=cellinside>50
</td>
<td class=cellinside>50
</td>
<td class=cellinside> 250
</td>
</tr>
<tr>
<td class=cellinside>1081wyujzkgninrt1srf79tbo
</td>
<td class=cellinside>2009-08-27
</td>
<td class=cellinside>3
</td>
<td class=cellinside> 58
</td>
<td class=cellinside>114
</td>
<td class=cellinside> NA
</td>
<td class=cellinside>  25
</td>
<td class=cellinside>30
</td>
<td class=cellinside>10
</td>
<td class=cellinside>  65
</td>
</tr>
<tr>
<td class=cellinside>1094yhx62fcdx3c012mlpxnex
</td>
<td class=cellinside>2009-10-15
</td>
<td class=cellinside>1
</td>
<td class=cellinside> NA
</td>
<td class=cellinside> NA
</td>
<td class=cellinside> NA
</td>
<td class=cellinside>1000
</td>
<td class=cellinside> NA
</td>
<td class=cellinside> NA
</td>
<td class=cellinside>1000
</td>
</tr>
</tbody>
</table>
<h2>Giving Levels</h2>
<p>With detailed giving levels we can infer a lot of information about a campaign, and about how the fundraisers are doing their jobs. If most of the giving was in the $15-20 range we can assume  they focus on small donors and maybe online contributions. If most of the giving is in the $100-250 range then maybe the campaign throws lots of medium sized dinners. If most of the donations are close to the legal maximum of $4800 then the campaign is focused on major donors, and might be ignoring smaller donors all together. </p>
<p>Plotting a histogram of total donation amount per individual will give us better insight into the giving levels.</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #080;">&gt;</span> qplot<span style="color: #080;">&#40;</span>total.<span style="">value</span>,<span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>cd0s,geom<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;histogram&quot;</span>,binwidth<span style="color: #080;">=</span><span style="color: #ff0000;">50</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>cd0<span style="color: #080;">&#91;</span>cd0$amount<span style="color: #080;">&lt;</span><span style="color: #ff0000;">250</span>,<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">/</span> <span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>cd0<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">summary</span><span style="color: #080;">&#40;</span>cd0s$total.<span style="">value</span><span style="color: #080;">&#41;</span></pre></div></div>

<div id="attachment_617" class="wp-caption aligncenter" style="width: 490px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/06/f1.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/06/f1.png" alt="Giving Levels, Smith for Congress 2010" title="Giving Levels, Smith for Congress 2010" width="480" height="480" class="size-full wp-image-617" /></a><p class="wp-caption-text">Giving Levels, Smith for Congress 2010</p></div>
<table border=0 class=dataframe>
<tbody>
<tr class= firstline >
<th>Min.</th>
<th>1st Qu.</th>
<th>Median</th>
<th>Mean</th>
<th>3rd Qu.</th>
<th>Max.</th>
</tr>
<td class=cellinside>   1</td>
<td class=cellinside>  25</td>
<td class=cellinside>  50</td>
<td class=cellinside> 111</td>
<td class=cellinside> 100</td>
<td class=cellinside>4800</td>
</tr>
</tbody>
</table>
<p>In 2010,  75% of contributors gave $100 or less total to the campaign. The summary table shows us the median total value donated was $50, while the overall average was $111. The maximum was $4800, which is also the maximum allowed by law for 2010.  We can infer that while there was certainly some major-donor solicitation, the fundraisers were focused on much smaller donors. </p>
<h2>Repeat donors</h2>
<p>Now that we know more about giving levels, it would be helpful to better understand giving frequency. The amount of repeat giving may give us insight in to how involved the fundraisers are getting, and maybe even how often they are asking for money.<br />
We&#8217;ll use a histogram and a cross-tab of the total number of contributions by individuals to help us with this analysis:</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">qplot<span style="color: #080;">&#40;</span>num.<span style="">contributions</span>,<span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>cd0s,geom<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;histogram&quot;</span>,binwidth<span style="color: #080;">=</span><span style="color: #ff0000;">1</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">table</span><span style="color: #080;">&#40;</span>cd0s$num.<span style="">contributions</span><span style="color: #080;">&#41;</span></pre></div></div>

<div id="attachment_620" class="wp-caption aligncenter" style="width: 490px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/06/f2.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/06/f2.png" alt="Giving Frequency, Smith for Congress 2010" title="Giving Frequency, Smith for Congress 2010" width="480" height="480" class="size-full wp-image-620" /></a><p class="wp-caption-text">Giving Frequency, Smith for Congress 2010</p></div>
<table border=0 class=dataframe>
<tbody>
<tr class= firstline >
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
<th>13</th>
<th>14</th>
<th>18</th>
<th>20</th>
</tr>
<tr>
<td class=cellinside>4242</td>
<td class=cellinside>1599</td>
<td class=cellinside> 621</td>
<td class=cellinside> 256</td>
<td class=cellinside> 120</td>
<td class=cellinside>  60</td>
<td class=cellinside>  28</td>
<td class=cellinside>   7</td>
<td class=cellinside>   7</td>
<td class=cellinside>   5</td>
<td class=cellinside>   1</td>
<td class=cellinside>   1</td>
<td class=cellinside>   1</td>
<td class=cellinside>   1</td>
</tr>
</tbody>
</table>
<p>Our plot and table shows about two thirds (61%, 4,242) of the contributors to Smith for Congress only gave one time, leaving 2,707 people who gave more than once. Most of the people who gave more than once gave twice, but there were still several hundred people who gave 3 or 4 times each. </p>
<p>To understand how important repeat giving might be we need more detailed information. We need to look at the total amount donated by each group of contributors; we&#8217;ll also include the cumulative total, cumulative percentage, and individual percentage of total for each group.</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">gft <span style="color: #080;">&lt;-</span> ddply<span style="color: #080;">&#40;</span>cd0s,<span style="color: #ff0000;">&quot;num.contributions&quot;</span>,<span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>total<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>x$total.<span style="">value</span><span style="color: #080;">&#41;</span>,n<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span>
gft$percent <span style="color: #080;">&lt;-</span> gft$total <span style="color: #080;">/</span> <span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>gft$total<span style="color: #080;">&#41;</span> <span style="color: #080;">*</span> <span style="color: #ff0000;">100</span>
gft$running.<span style="">total</span> <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">cumsum</span><span style="color: #080;">&#40;</span>gft$total<span style="color: #080;">&#41;</span> 
gft$running.<span style="">percent</span> <span style="color: #080;">&lt;-</span> gft$running.<span style="">total</span> <span style="color: #080;">/</span> <span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>gft$total<span style="color: #080;">&#41;</span> <span style="color: #080;">*</span> <span style="color: #ff0000;">100</span></pre></div></div>

<p>Our <strong>gft</strong> data frame looks like this: </p>
<table border=0 class=dataframe>
<tbody>
<tr class= firstline >
<th>num.contributions  </th>
<th>total  </th>
<th>n  </th>
<th>percent  </th>
<th>running.total  </th>
<th>running.percent</th>
</tr>
<tr>
<td class=cellinside> 1
</td>
<td class=cellinside>284043
</td>
<td class=cellinside>4242
</td>
<td class=cellinside>36.821
</td>
<td class=cellinside>284043
</td>
<td class=cellinside> 37
</td>
</tr>
<tr>
<td class=cellinside> 2
</td>
<td class=cellinside>212697
</td>
<td class=cellinside>1599
</td>
<td class=cellinside>27.572
</td>
<td class=cellinside>496740
</td>
<td class=cellinside> 64
</td>
</tr>
<tr>
<td class=cellinside> 3
</td>
<td class=cellinside>118998
</td>
<td class=cellinside> 621
</td>
<td class=cellinside>15.426
</td>
<td class=cellinside>615738
</td>
<td class=cellinside> 80
</td>
</tr>
<tr>
<td class=cellinside> 4
</td>
<td class=cellinside> 72197
</td>
<td class=cellinside> 256
</td>
<td class=cellinside> 9.359
</td>
<td class=cellinside>687935
</td>
<td class=cellinside> 89
</td>
</tr>
<tr>
<td class=cellinside> 5
</td>
<td class=cellinside> 43513
</td>
<td class=cellinside> 120
</td>
<td class=cellinside> 5.641
</td>
<td class=cellinside>731448
</td>
<td class=cellinside> 95
</td>
</tr>
<tr>
<td class=cellinside> 6
</td>
<td class=cellinside> 24428
</td>
<td class=cellinside>  60
</td>
<td class=cellinside> 3.167
</td>
<td class=cellinside>755876
</td>
<td class=cellinside> 98
</td>
</tr>
<tr>
<td class=cellinside> 7
</td>
<td class=cellinside>  4825
</td>
<td class=cellinside>  28
</td>
<td class=cellinside> 0.625
</td>
<td class=cellinside>760701
</td>
<td class=cellinside> 99
</td>
</tr>
<tr>
<td class=cellinside> 8
</td>
<td class=cellinside>  3988
</td>
<td class=cellinside>   7
</td>
<td class=cellinside> 0.517
</td>
<td class=cellinside>764689
</td>
<td class=cellinside> 99
</td>
</tr>
<tr>
<td class=cellinside> 9
</td>
<td class=cellinside>  4340
</td>
<td class=cellinside>   7
</td>
<td class=cellinside> 0.563
</td>
<td class=cellinside>769029
</td>
<td class=cellinside>100
</td>
</tr>
<tr>
<td class=cellinside>10
</td>
<td class=cellinside>   990
</td>
<td class=cellinside>   5
</td>
<td class=cellinside> 0.128
</td>
<td class=cellinside>770019
</td>
<td class=cellinside>100
</td>
</tr>
<tr>
<td class=cellinside>13
</td>
<td class=cellinside>   167
</td>
<td class=cellinside>   1
</td>
<td class=cellinside> 0.022
</td>
<td class=cellinside>770186
</td>
<td class=cellinside>100
</td>
</tr>
<tr>
<td class=cellinside>14
</td>
<td class=cellinside>   675
</td>
<td class=cellinside>   1
</td>
<td class=cellinside> 0.088
</td>
<td class=cellinside>770861
</td>
<td class=cellinside>100
</td>
</tr>
<tr>
<td class=cellinside>18
</td>
<td class=cellinside>   360
</td>
<td class=cellinside>   1
</td>
<td class=cellinside> 0.047
</td>
<td class=cellinside>771221
</td>
<td class=cellinside>100
</td>
</tr>
<tr>
<td class=cellinside>20
</td>
<td class=cellinside>   200
</td>
<td class=cellinside>   1
</td>
<td class=cellinside> 0.026
</td>
<td class=cellinside>771421
</td>
<td class=cellinside>100
</td>
</tr>
</tbody>
</table>
<p>We see the campaign raised $284,000 (36.8% of the total raised) from the 4,242 contributors that gave only once, and $212,000 (27.5% of the total raised) from the 1,599 contributors who gave two times. We also see the campaign raised $487,378 from 2,702 repeat donors; that is almost 64% of the total value raised for the entire cycle from individuals. It is obvious the Smith for Congress campaign is good at attracting small dollar donors, one-third whom gave more man once. This is a pretty impressive repeat donor rate. </p>
<p>Finally I&#8217;d like to look at what kind of donations make up each level of giving. We know repeat donors gave $487,000, but we don&#8217;t know if that was mostly in $50 donations or in $250 donations. We can use a <a href="http://en.wikipedia.org/wiki/Box_plot">box and whisker plot</a> to break down each giving level. I&#8217;m leaving off contribution levels 8 &#8211; 14 since giving was so sparse at those levels. We&#8217;ll be plotting this histogram with a log transform on the y axis since few very large values will skew graph and render it mostly useless. I used a trick from <a href="http://stackoverflow.com/questions/4699493/transform-only-one-axis-to-log10-scale-with-ggplot2">this stack overflow thread</a> to get the formatting correct on the Y axis:</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">formatBack <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span> <span style="color: #0000FF; font-weight: bold;">paste</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">round</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">10</span><span style="color: #080;">^</span>x, <span style="color: #ff0000;">2</span><span style="color: #080;">&#41;</span>, <span style="color: #ff0000;">&quot;$&quot;</span>, sep<span style="color: #080;">=</span><span style="color: #ff0000;">' '</span><span style="color: #080;">&#41;</span> 
qplot<span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">factor</span><span style="color: #080;">&#40;</span>num.<span style="">contributions</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">log10</span><span style="color: #080;">&#40;</span>total.<span style="">value</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">data</span><span style="color: #080;">=</span>cd0s<span style="color: #080;">&#91;</span>cd0s$num.<span style="">contributions</span> <span style="color: #080;">&lt;</span> <span style="color: #ff0000;">8</span>,<span style="color: #080;">&#93;</span>,geom<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;boxplot&quot;</span>,ylab<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Total Value (log)&quot;</span>,xlab<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Giving Frequency&quot;</span>,main<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;Giving Levels by Giving Frequency, Smith for Congress 2010&quot;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">+</span> scale_y_continuous<span style="color: #080;">&#40;</span>formatter<span style="color: #080;">=</span>formatBack<span style="color: #080;">&#41;</span>
<span style="color: #228B22;"># same data, but in table format </span>
ddply<span style="color: #080;">&#40;</span>cd0s,<span style="color: #ff0000;">&quot;num.contributions&quot;</span>,<span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span> <span style="color: #0000FF; font-weight: bold;">data.<span style="">frame</span></span><span style="color: #080;">&#40;</span>total<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sum</span><span style="color: #080;">&#40;</span>x$total.<span style="">value</span><span style="color: #080;">&#41;</span>,n<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>x<span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">min</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">min</span><span style="color: #080;">&#40;</span>x$total.<span style="">value</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">mean</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">mean</span><span style="color: #080;">&#40;</span>x$total.<span style="">value</span><span style="color: #080;">&#41;</span>, <span style="color: #0000FF; font-weight: bold;">median</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">median</span><span style="color: #080;">&#40;</span>x$total.<span style="">value</span><span style="color: #080;">&#41;</span>,std<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">sd</span><span style="color: #080;">&#40;</span>x$total.<span style="">value</span><span style="color: #080;">&#41;</span>,<span style="color: #0000FF; font-weight: bold;">max</span><span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">max</span><span style="color: #080;">&#40;</span>x$total.<span style="">value</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#125;</span><span style="color: #080;">&#41;</span></pre></div></div>

<div id="attachment_625" class="wp-caption aligncenter" style="width: 490px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/06/f3.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/06/f3.png" alt="Giving Levels by Giving Frequency, Smith for Congress 2010" title="Giving Levels by Giving Frequency, Smith for Congress 2010" width="480" height="480" class="size-full wp-image-625" /></a><p class="wp-caption-text">Giving Levels by Giving Frequency, Smith for Congress 2010</p></div>
<table border=0 class=dataframe>
<tbody>
<tr class= firstline >
<th>num.contributions  </th>
<th>total  </th>
<th>n  </th>
<th>min  </th>
<th>mean  </th>
<th>median  </th>
<th>std  </th>
<th>max</th>
</tr>
<tr>
<td class=cellinside> 1
</td>
<td class=cellinside>284043
</td>
<td class=cellinside>4242
</td>
<td class=cellinside>  1
</td>
<td class=cellinside> 67
</td>
<td class=cellinside> 35
</td>
<td class=cellinside> 149
</td>
<td class=cellinside>2400
</td>
</tr>
<tr>
<td class=cellinside> 2
</td>
<td class=cellinside>212697
</td>
<td class=cellinside>1599
</td>
<td class=cellinside>  2
</td>
<td class=cellinside>133
</td>
<td class=cellinside> 70
</td>
<td class=cellinside> 280
</td>
<td class=cellinside>4800
</td>
</tr>
<tr>
<td class=cellinside> 3
</td>
<td class=cellinside>118998
</td>
<td class=cellinside> 621
</td>
<td class=cellinside>  4
</td>
<td class=cellinside>192
</td>
<td class=cellinside>105
</td>
<td class=cellinside> 299
</td>
<td class=cellinside>3800
</td>
</tr>
<tr>
<td class=cellinside> 4
</td>
<td class=cellinside> 72197
</td>
<td class=cellinside> 256
</td>
<td class=cellinside> 20
</td>
<td class=cellinside>282
</td>
<td class=cellinside>144
</td>
<td class=cellinside> 443
</td>
<td class=cellinside>3800
</td>
</tr>
<tr>
<td class=cellinside> 5
</td>
<td class=cellinside> 43513
</td>
<td class=cellinside> 120
</td>
<td class=cellinside>  5
</td>
<td class=cellinside>363
</td>
<td class=cellinside>175
</td>
<td class=cellinside> 616
</td>
<td class=cellinside>4129
</td>
</tr>
<tr>
<td class=cellinside> 6
</td>
<td class=cellinside> 24428
</td>
<td class=cellinside>  60
</td>
<td class=cellinside> 30
</td>
<td class=cellinside>407
</td>
<td class=cellinside>168
</td>
<td class=cellinside> 749
</td>
<td class=cellinside>4700
</td>
</tr>
<tr>
<td class=cellinside> 7
</td>
<td class=cellinside>  4825
</td>
<td class=cellinside>  28
</td>
<td class=cellinside> 33
</td>
<td class=cellinside>172
</td>
<td class=cellinside>175
</td>
<td class=cellinside> 103
</td>
<td class=cellinside> 475
</td>
</tr>
<tr>
<td class=cellinside> 8
</td>
<td class=cellinside>  3988
</td>
<td class=cellinside>   7
</td>
<td class=cellinside> 80
</td>
<td class=cellinside>570
</td>
<td class=cellinside>160
</td>
<td class=cellinside>1094
</td>
<td class=cellinside>3048
</td>
</tr>
<tr>
<td class=cellinside> 9
</td>
<td class=cellinside>  4340
</td>
<td class=cellinside>   7
</td>
<td class=cellinside> 90
</td>
<td class=cellinside>620
</td>
<td class=cellinside>225
</td>
<td class=cellinside> 627
</td>
<td class=cellinside>1450
</td>
</tr>
<tr>
<td class=cellinside>10
</td>
<td class=cellinside>   990
</td>
<td class=cellinside>   5
</td>
<td class=cellinside>100
</td>
<td class=cellinside>198
</td>
<td class=cellinside>200
</td>
<td class=cellinside>  72
</td>
<td class=cellinside> 280
</td>
</tr>
<tr>
<td class=cellinside>13
</td>
<td class=cellinside>   167
</td>
<td class=cellinside>   1
</td>
<td class=cellinside>167
</td>
<td class=cellinside>167
</td>
<td class=cellinside>167
</td>
<td class=cellinside>  NA
</td>
<td class=cellinside> 167
</td>
</tr>
<tr>
<td class=cellinside>14
</td>
<td class=cellinside>   675
</td>
<td class=cellinside>   1
</td>
<td class=cellinside>675
</td>
<td class=cellinside>675
</td>
<td class=cellinside>675
</td>
<td class=cellinside>  NA
</td>
<td class=cellinside> 675
</td>
</tr>
<tr>
<td class=cellinside>18
</td>
<td class=cellinside>   360
</td>
<td class=cellinside>   1
</td>
<td class=cellinside>360
</td>
<td class=cellinside>360
</td>
<td class=cellinside>360
</td>
<td class=cellinside>  NA
</td>
<td class=cellinside> 360
</td>
</tr>
<tr>
<td class=cellinside>20
</td>
<td class=cellinside>   200
</td>
<td class=cellinside>   1
</td>
<td class=cellinside>200
</td>
<td class=cellinside>200
</td>
<td class=cellinside>200
</td>
<td class=cellinside>  NA
</td>
<td class=cellinside> 200
</td>
</tr>
</tbody>
</table>
</td>
</table>
<p>This latest plot and table are both incredibly text heavy, but this is the critical intelligence required to start a fundraising plan. </p>
<p>We see the average total contribution increases with the giving frequency, this makes sense. The average increases in an approximately linear fashion which suggests the individual contribution amounts are staying constant. This may be a function of some campaign fundraising tactic, like &#8220;donate $35 now for a free tshirt.&#8221; We can also get a sense of how much success the Smith for Congress major donor program enjoys. An individual can legally donate $2,400 for both a primary and a general election per cycle. We can count how many individuals have maxed out at $4800 and measure how much impact the major donors have on the total amounts raised:</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #228B22;"># how many individuals gave the max for one election</span>
<span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>cd0s<span style="color: #080;">&#91;</span>cd0s$total.<span style="">value</span> <span style="color: #080;">==</span> <span style="color: #ff0000;">2400</span>,<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">nrow</span><span style="color: #080;">&#40;</span>cd0s<span style="color: #080;">&#91;</span>cd0s$total.<span style="">value</span> <span style="color: #080;">==</span> <span style="color: #ff0000;">4800</span>,<span style="color: #080;">&#93;</span><span style="color: #080;">&#41;</span></pre></div></div>

<p>We see 7 individuals who gave the maximum for one election, and only 2 individuals who maxed out for the entire cycle. The maxed out donors make up only 1.2% of total giving; this is very low for the average campaign. This tells us  major donors aren&#8217;t the most important segment to Smith for Congress, but it could also mean that the campaign isn&#8217;t able or isn&#8217;t willing to ask the max amount from large donors. </p>
<h2>Take Away</h2>
<p>We can take away the following facts from our analysis: </p>
<ul>
<li>40% of individual donors gave more than once to Smith for Congress </li>
<li>80% of donors gave $100 or less to the campaign</li>
<li>Repeat donors gave $487,000 total to the campaign</li>
<li>Two out of 6,949 (0.028 percent) donors gave the maximum amount allowable by law for a total of 1.2% of the total amount raised</li>
</ul>
<p>From all this we can infer that Smith for Congress is running a very strong repeat donor program, and isn&#8217;t focused on only high-dollar donors.  This information could be very useful in a number of different ways. A treasurer for Smith for Congress could use this information to design a 2012 fundraising plan and campaign budget. A candidate similar to Smith, or running in a similar district, could use this same information to plan their own campaign. Or a rival campaign could use this during opposition research and financial planning. Or researchers could use this to build better generic models of US House individual fundraising. I hope this shows that detailed campaign finance analysis is pretty simple when you&#8217;ve got access to the relevant data, which unfortunately is very uncommon.</p>
<p>Thanks for reading, questions or comments are always appreciated: jjh@offensivepolitics.net</p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/2011/06/donor-analysis-in-r-smith-for-congress/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Individual contributions in US House elections &#8211; Smith for Congress</title>
		<link>http://offensivepolitics.net/blog/2011/06/individual-contributions-in-us-house-elections-smith-for-congress/</link>
		<comments>http://offensivepolitics.net/blog/2011/06/individual-contributions-in-us-house-elections-smith-for-congress/#comments</comments>
		<pubDate>Thu, 02 Jun 2011 16:31:49 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Campaign Finance]]></category>
		<category><![CDATA[US House]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=545</guid>
		<description><![CDATA[Introducing the Smith for Congress individual contributor data set. ]]></description>
			<content:encoded><![CDATA[<h2>Campaign finance primer</h2>
<p> In the United States, a candidate for federal office can accept donations to fund their campaign from many different sources such as individuals, political action committees, and political parties. There are limits placed on how much an individual, committee or party can contribute, and there are rules for how those contributions are disclosed to the public. These rules and limits and enforced by the <a href="http://www.fec.gov/">Federal Election Commission</a>. </p>
<p>According to the <a href="http://www.fec.gov/pages/brochures/contrib.shtml">2012 FEC Contributions Brochure</a>, an individual is allowed to contribute up to $2,500 to a given candidate in the 2012 general election cycle. If an individual gives $200 or more to a campaign in a single cycle, the campaign is required to disclose the donor&#8217;s name, address, occupation, and employer on a public donor list. If an individual gives less than $200, their personal information is not disclosed anywhere. </p>
<p>In FEC terminology, a donor who exceeds the $200 disclosure limit is called &#8220;itemized&#8221;, but otherwise is considered &#8220;unitemized.&#8221; A campaign discloses to the FEC, among other things, how much of their individual contributions came from unitemized donors and how much came from itemized donors. </p>
<p>The unitemized amount for congressional campaigns who raised at least $10,000.00 from individuals in 2010 represented on average 21% of their total individual contributions (n=1391,stddev=17%). Without the unitemized transactions we&#8217;re missing 1/5th of the total contributions of most campaigns. Without the unitemized transactions, we cannot properly build models to analyze small dollar fund raising. Tech President has an <a href="http://techpresident.com/blog-entry/campaign-finance-20-small-donor-revolution-hype-or-reality">excellent writeup</a> on small donors in 2008 and some useful links to other small donor research.  </p>
<h2>New Data</h2>
<p>Since I spend a lot of time digging through FEC reports, I wrote <a href="http://offensivepolitics.net/fechell/">some software</a> that sort of helps. While do some unrelated research I found that one campaign was disclosing transactions that they didn&#8217;t need to disclose. Instead of only disclosing contributions above the $200-per-cycle aggregate amount, they were disclosing every single individual contribution.</p>
<p>As far as I can tell they are the only campaign to do so, ever, and this level of contributor data has never been made public. Big-time important congressional researchers like Fenno or Mayhew probably have access to data like this, and there may be some old stuff in ISPCR, but regular people like myself have never seen this. So I wrote some software to collect, clean, and verify the transactions and now I&#8217;m releasing it to the public. </p>
<h2>Smith For Congress</h2>
<p>I&#8217;m calling this data set Smith For Congress, it will live at the <a href="http://offensivepolitics.net/smithforcongress/">Smith for Congress</a> homepage. The initial data set contains 3 election cycles (2006-2010) of individual transactional data, 49880 transactions in total. Transactions for less than $.10 were culled from the list, as were a handful of misclassified returned contributions. </p>
<p>The set contains five fields: </p>
<ul>
<li><b>individual identifier</b></li>
<li><b>contribution amount</b></li>
<li><b>contribution date</b></li>
<li><b>cycle-to-date total contribution for this individual</b></li>
<li><b>election cycle</b></li>
</ul>
<p>The individual identifier is consistent across cycles, and used in place of a name. It is generated by running the name and address of the contributor through a standardization process, and hashing the result. </p>
<p>With the full transaction data provided in this data set we can perform analysis on how much, and how often donors make contributions. This can lead to better models to build better metrics about campaigns and individual giving. These metrics can help us more fully understand how congressional campaigns raise money and the role of individuals in them. </p>
<h3>What is not included</h3>
<p>The name of the candidate filing the reports; Smith for Congress is a code name. I&#8217;m also not releasing individual names or addresses of contributors, as they are not important for the research goals I had for putting this set together. It is all there in the FEC documents, and it shouldn&#8217;t take somebody more than 10 minutes if you feel like looking for either omission. </p>
<h3>Finally</h3>
<p>I believe the Smith for Congress campaign is a fair approximation for individual donations for a certain type of campaign, but this data is from one of thousands of congressional campaigns every cycle. Assumptions and models built from this data set may not hold true for other campaigns, but should help us as researchers better understand the dynamics of individual fundraising in congressional campaigns. </p>
<p>Over the coming weeks I&#8217;m going to be putting up short posts about different types of analysis that can be performed on these data. I will provide back links to the new posts here, but you should subscribe to the blog&#8217;s <a href="http//offensivepolitics.net/blog/feed/">RSS feed</a> or <a href="http://twitter.com//offpol">follow me on twitter</a> for updates. </p>
<p>Data can be downloaded here: <a href="http://offensivepolitics.net/smithforcongress/">Smith For Congress</a></p>
<p>Please do not hesitate to contact me at jjh@offensivepolitics.net with comments or questions.</p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/2011/06/individual-contributions-in-us-house-elections-smith-for-congress/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Visualizing iPhone location tracking with R and Google Maps</title>
		<link>http://offensivepolitics.net/blog/2011/04/visualizing-iphone-location-tracking-with-r-and-google-maps/</link>
		<comments>http://offensivepolitics.net/blog/2011/04/visualizing-iphone-location-tracking-with-r-and-google-maps/#comments</comments>
		<pubDate>Fri, 22 Apr 2011 14:25:34 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=548</guid>
		<description><![CDATA[Visualize location logs created by the iPhone using R and Google Maps. ]]></description>
			<content:encoded><![CDATA[<p>According to security researchers the iPhone 4 is logging location data in the background, and apparently sending some part of that data to Apple every day or few days (<a href="http://www.wired.com/gadgetlab/2011/04/apple-iphone-tracking/" target="_blank">Wired</a>). Silently recording location data is bad enough, but the data itself is easily recoverable from an iPhone backup. Some enterprising guys (<a href="http://twitter.com/aallan" target="_blank">@aallen</a>,<a href="http://twitter.com/petewarden" target="_blank">@petewarden</a>) wrote an OSX application <a href="http://petewarden.github.com/iPhoneTracker/" target="_blank">iPhone Tracker</a> to parse and visualize the location data on a map. As appalled as I was that this data exists, I was also really interested in rewriting their visualization code in R. </p>
<p>Researcher Drew Conway beat me to it with <a href="http://www.drewconway.com/zia/?p=2721" target="_blank">stalkR</a>, but my code is sufficiently different that I think people can learn from both. I&#8217;ll walk through the code, links to the github repo are at the end of the post. </p>
<p>Since the location database is stored inside an iOS backup, we&#8217;ll need to understand the structure of that backup. The backup contains a bunch of files named with a long hex string, and a few files that provide a binary table of contents. There is some nice python code (<a href="http://code.google.com/p/iphone-backup-decoder/">iPhone Backup Decoder</a> to open up the table of contents and locate specific files. I was going to translate this code to R, but I decided on a brute-force approach instead. The file we&#8217;re looking for is a SQLite database, and contains several unique tables. I just try to open every file in a given backup directory as a SQLite database, and look for a known table name (CellLocation). If the file isn&#8217;t a database or the table doesn&#8217;t exist then we move on.</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;"><span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>RSQLite<span style="color: #080;">&#41;</span>
<span style="color: #0000FF; font-weight: bold;">library</span><span style="color: #080;">&#40;</span>RgoogleMaps<span style="color: #080;">&#41;</span>
&nbsp;
findLocationDB <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span> <span style="color: #080;">&#40;</span>basePath<span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
	filename <span style="color: #080;">&lt;-</span> NA
	drv <span style="color: #080;">&lt;-</span> dbDriver<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;SQLite&quot;</span><span style="color: #080;">&#41;</span>
	<span style="color: #0000FF; font-weight: bold;">for</span><span style="color: #080;">&#40;</span>testFileName <span style="color: #0000FF; font-weight: bold;">in</span> <span style="color: #0000FF; font-weight: bold;">list.<span style="">files</span></span><span style="color: #080;">&#40;</span>basePath<span style="color: #080;">&#41;</span> <span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
		<span style="color: #228B22;"># brute-force the connection</span>
		con <span style="color: #080;">&lt;-</span> dbConnect<span style="color: #080;">&#40;</span>drv,<span style="color: #0000FF; font-weight: bold;">paste</span><span style="color: #080;">&#40;</span>basePath,testFileName,sep<span style="color: #080;">=</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span>
		<span style="color: #228B22;"># try and list the tables. </span>
		<span style="color: #228B22;"># this will fail if the file is not a sqlite db</span>
		tableList <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">tryCatch</span><span style="color: #080;">&#40;</span>dbListTables<span style="color: #080;">&#40;</span>con<span style="color: #080;">&#41;</span>, error<span style="color: #080;">=</span><span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>e<span style="color: #080;">&#41;</span> e<span style="color: #080;">&#41;</span>	
		<span style="color: #228B22;"># if class of tableList is character then we've got a sqlite DB </span>
		<span style="color: #0000FF; font-weight: bold;">if</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">any</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">class</span><span style="color: #080;">&#40;</span>tableList<span style="color: #080;">&#41;</span> <span style="color: #080;">==</span> <span style="color: #ff0000;">&quot;character&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
			<span style="color: #228B22;"># look for the CellLocation table</span>
			<span style="color: #0000FF; font-weight: bold;">if</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">length</span><span style="color: #080;">&#40;</span><span style="color: #0000FF; font-weight: bold;">grep</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;CellLocation&quot;</span>, tableList<span style="color: #080;">&#41;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&gt;</span><span style="color: #ff0000;">0</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>
			<span style="color: #228B22;"># we've found it. save this filename</span>
				filename <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">paste</span><span style="color: #080;">&#40;</span>basePath,<span style="color: #ff0000;">&quot;/&quot;</span>,testFileName,sep<span style="color: #080;">=</span><span style="color: #ff0000;">''</span><span style="color: #080;">&#41;</span>
				dbDisconnect<span style="color: #080;">&#40;</span>con<span style="color: #080;">&#41;</span>
				<span style="color: #0000FF; font-weight: bold;">break</span>
			<span style="color: #080;">&#125;</span>
		<span style="color: #080;">&#125;</span>
		dbDisconnect<span style="color: #080;">&#40;</span>con<span style="color: #080;">&#41;</span>
	<span style="color: #080;">&#125;</span>
	dbUnloadDriver<span style="color: #080;">&#40;</span>drv<span style="color: #080;">&#41;</span>	
	<span style="color: #0000FF; font-weight: bold;">return</span><span style="color: #080;">&#40;</span>filename<span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span></pre></div></div>

<p>Now that we&#8217;ve got a function to find a location database, we can access the database and load it into a data frame.</p>

<div class="wp_syntax"><div class="code"><pre class="rsplus" style="font-family:monospace;">fetchLatLongTimestamp <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">function</span><span style="color: #080;">&#40;</span>dbLocation,loc.<span style="">table</span>.<span style="">name</span>,accuracy<span style="color: #080;">=</span><span style="color: #ff0000;">1.0</span><span style="color: #080;">&#41;</span> <span style="color: #080;">&#123;</span>	
	ldata <span style="color: #080;">&lt;-</span> NA
	con <span style="color: #080;">&lt;-</span> dbConnect<span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;SQLite&quot;</span>, dbLocation<span style="color: #080;">&#41;</span>
	ldata <span style="color: #080;">&lt;-</span> dbReadTable<span style="color: #080;">&#40;</span>con, loc.<span style="">table</span>.<span style="">name</span><span style="color: #080;">&#41;</span>
	<span style="color: #228B22;"># drop data where lat == 0.0 &amp;&amp; long == 0.0</span>
	ldata <span style="color: #080;">&lt;-</span> ldata<span style="color: #080;">&#91;</span><span style="color: #080;">-</span><span style="color: #0000FF; font-weight: bold;">which</span><span style="color: #080;">&#40;</span>ldata$Latitude <span style="color: #080;">==</span> <span style="color: #ff0000;">0.0</span> <span style="color: #080;">&amp;</span> ldata$Longitude <span style="color: #080;">==</span> <span style="color: #ff0000;">0.0</span><span style="color: #080;">&#41;</span>,<span style="color: #080;">&#93;</span>
	<span style="color: #228B22;"># convert the mac timestamp to unix timestamp</span>
        ldata$datetime <span style="color: #080;">&lt;-</span> <span style="color: #0000FF; font-weight: bold;">as.<span style="">POSIXlt</span></span><span style="color: #080;">&#40;</span>ldata$Timestamp, origin<span style="color: #080;">=</span><span style="color: #ff0000;">&quot;2001-01-01&quot;</span><span style="color: #080;">&#41;</span>
	<span style="color: #228B22;"># downsample the lat long by accuracy to obscure the location</span>
	ldata$Latitude <span style="color: #080;">&lt;-</span> ldata$Latitude <span style="color: #080;">/</span> accuracy
	ldata$Longitude <span style="color: #080;">&lt;-</span> ldata$Longitude <span style="color: #080;">/</span> accuracy	
	ldata <span style="color: #080;">&lt;-</span> ldata<span style="color: #080;">&#91;</span>,<span style="color: #0000FF; font-weight: bold;">c</span><span style="color: #080;">&#40;</span><span style="color: #ff0000;">&quot;Latitude&quot;</span>,<span style="color: #ff0000;">&quot;Longitude&quot;</span>, <span style="color: #ff0000;">&quot;datetime&quot;</span><span style="color: #080;">&#41;</span><span style="color: #080;">&#93;</span>
	dbDisconnect<span style="color: #080;">&#40;</span>con<span style="color: #080;">&#41;</span>
	<span style="color: #0000FF; font-weight: bold;">return</span><span style="color: #080;">&#40;</span>ldata<span style="color: #080;">&#41;</span>
<span style="color: #080;">&#125;</span></pre></div></div>

<p>This <b>fetchLatLongTimestamp</b> function will load the entire location database into a data frame, and then clean up the timestamps and remove bad location data. I had originally seen the time stamp correction code on <a href="http://jackman.stanford.edu/blog/?p=2025" target="_new">Prof Jackman&#8217;s blog</a>, so thanks to him for that (and <a href="http://pscl.stanford.edu/" target="_blank">pscl</a>!).</p>
<p>Now we&#8217;ve got a data frame of Latitude, Longitude, and datetime stamp that looks more or less like this: </p>
<table>
<tr>
<th>Lat</th>
<th>Lon</th>
<th>Timestamp</th>
</tr>
<tr>
<td>38.90612</td>
<td>-77.03961</td>
<td>2011-03-17 17:03:09</tr>
<tr>
<td>38.90563</td>
<td>-77.03929</td>
<td>2011-03-17 17:03:09</tr>
<tr>
<td>38.90567</td>
<td>-77.03957</td>
<td>2011-03-17 17:03:09</tr>
<tr>
<td>38.90574</td>
<td>-77.03988</td>
<td>2011-03-17 17:03:09</tr>
<tr>
<td>38.90561</td>
<td>-77.03967</td>
<td>2011-03-17 17:03:09</tr>
</table>
<p>The Lat/Lon represents downtown DC, near where I bought my iPhone last month. </p>
<p>Now that we&#8217;ve got a data frame full of juicy location data, we need to plot it on a map. I used the fantastic <a href="http://cran.r-project.org/web/packages/RgoogleMaps/index.html" target="_blank">RgoogleMaps</a> package, and ripped most of the vignette (pdf: RgoogleMaps: <a href="http://cran.r-project.org/web/packages/RgoogleMaps/vignettes/RgoogleMaps-intro.pdf" target="_blank">An R Package for plotting on Google map tiles within R</a>) for loading a map and plotting points by latitude and longitude. </p>
<p>If I&#8217;ve got my location data in a data frame called <b>ldata</b>, I can use the following to find the correct bounds and zoom level, fetch a map, and plot my location data. Again, the drawing code is basically ripped from the RgoogleMaps vignette.</p>

<div class="wp_syntax"><div class="code"><pre class="rplus" style="font-family:monospace;">## plot a map of all the positions
	bb &lt;- qbbox(ldata$Latitude, ldata$Longitude)
	# zoomlevel 4 works for my data (US only) 
	zoomlevel &lt;- 4
	# grab the map
	map &lt;- GetMap.bbox(bb$lonR, bb$latR,zoom=zoomlevel,maptype=&quot;mobile&quot;)
	# plot the points as circles 
	PlotOnStaticMap(map,lon=ldata$Longitude,lat=ldata$Latitude,col=&quot;blue&quot;,verbose=0)</pre></div></div>

<p>Which gives us:<br />
<img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/04/all-tracks.png" alt="" title="All Data" width="640" height="640" class="aligncenter size-full wp-image-569" /></p>
<p>Obviously I spent a lot of time in Washington, DC, New York, Boston, and Las Vegas. We&#8217;re just using R, I can easily slice and dice the data. Let&#8217;s say I just wanted to see my Las Vegas data (April 1st &#8211; April 4th):</p>

<div class="wp_syntax"><div class="code"><pre class="rplus" style="font-family:monospace;">	ldata.lv &lt;- ldata[which(ldata$datetime &gt;= as.POSIXlt('2011-04-01 23:00:00') &amp; ldata$datetime &lt;= as.POSIXlt('2011-04-04 14:00:00')),]
	bb.lv &lt;- qbbox(ldata.lv$Latitude, ldata.lv$Longitude)
	# zoom level of 12 center nicely on the las vegas strip
	zoom.lv &lt;- 12
	map.lv &lt;- GetMap.bbox(bb.lv$lonR, bb.lv$latR,zoom=12,maptype=&quot;mobile&quot;)
	PlotOnStaticMap(map.lv,lon=ldata.lv$Longitude,lat=ldata.lv$Latitude,col=&quot;blue&quot;,verbose=0)</pre></div></div>

<p>Which gives us:<br />
<img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/04/lv-tracks.png" alt="" title="Las Vegas" width="640" height="640" class="aligncenter size-full wp-image-571" /></p>
<p>Yes, I spent a lot of time at the Wynn, Caesar palace, and In n Out Burger. </p>
<p>Here is the full driver code:</p>

<div class="wp_syntax"><div class="code"><pre class="rplus" style="font-family:monospace;">## change this to the full path of a backup of an ios 4 device
backupPath &lt;- &quot;C:/Documents and Settings/YourUser/Application Data/apple computer/MobileSync/Backup/a6ddb1824738f61a15b3e3c87e3e8172599b7134/&quot;
&nbsp;
dbLoc &lt;- findLocationDB(backupPath)
&nbsp;
if(!is.na(dbLoc)) {
	print(sprintf(&quot;Found location database in path: %s!&quot;,dbLoc))
&nbsp;
	## for Verizon phones
	# locs &lt;- fetchLatLongTimestamp(dbLoc, &quot;CdmaCellLocation&quot;)
	## for AT&amp;T phones
	ldata &lt;- fetchLatLongTimestamp(dbLoc, &quot;CellLocation&quot;)
&nbsp;
	## plot a map of all the positions
	bb &lt;- qbbox(ldata$Latitude, ldata$Longitude)
	# zoomlevel 4 works for my data (US only) 
	zoomlevel &lt;- 4
	# grab the map
	map &lt;- GetMap.bbox(bb$lonR, bb$latR,zoom=zoomlevel,maptype=&quot;mobile&quot;)
	png(&quot;all-tracks.png&quot;, width=640,height=640)
	# plot the points as circles 
	PlotOnStaticMap(map,lon=ldata$Longitude,lat=ldata$Latitude,col=&quot;blue&quot;,verbose=0)
	dev.off()
&nbsp;
	## limit the data to 4/1-4/4. I was in las vegas at the time.
	ldata.lv &lt;- ldata[which(ldata$datetime &gt;= as.POSIXlt('2011-04-01 23:00:00') &amp; ldata$datetime &lt;= as.POSIXlt('2011-04-04 14:00:00')),]
	bb.lv &lt;- qbbox(ldata.lv$Latitude, ldata.lv$Longitude)
	# zoom level of 12 center nicely on the strip
	zoom.lv &lt;- 12
	map.lv &lt;- GetMap.bbox(bb.lv$lonR, bb.lv$latR,zoom=12,destfile=&quot;lv.png&quot;,maptype=&quot;mobile&quot;)
	png(&quot;lv-tracks.png&quot;,width=640,height=640)
	PlotOnStaticMap(map.lv,lon=ldata.lv$Longitude,lat=ldata.lv$Latitude,col=&quot;blue&quot;,verbose=0)
	dev.off()
&nbsp;
} else {
	print(sprintf(&quot;Could not find location database in path: %s&quot;,backupPath))
}</pre></div></div>

<p>You can see this code on my <a href="https://github.com/offensivepolitics/iphone-location-r" target="_blank">iPhone location with R</a> github repo. One big missing feature from the original application is animation, which I may add later. Patches and comments are greatly appreciated!</p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/2011/04/visualizing-iphone-location-tracking-with-r-and-google-maps/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>More Chicago Mayoral Analaysis</title>
		<link>http://offensivepolitics.net/blog/2011/02/more-chicago-mayoral-analaysis/</link>
		<comments>http://offensivepolitics.net/blog/2011/02/more-chicago-mayoral-analaysis/#comments</comments>
		<pubDate>Sat, 26 Feb 2011 17:08:29 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[Elections]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Chicago]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=514</guid>
		<description><![CDATA[I perform a precincts-votes analysis on the returns from the Chicago Democratic Mayoral primary of 2011. ]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>In a <a href="http://offensivepolitics.net/blog/2011/02/mapping-the-2011-chicago-mayoral-democratic-primary/">previous post</a> on the Chicago Mayoral primary I looked at plotting returns on maps as a way to better understand the outcome. Maps help us visually determine if there is a geographic or clustering component to returns, but they aren&#8217;t the most rigorous way to compare election returns. </p>
<p>Another way to view the returns is a to use a Seats-Votes plot, like I did in my <a href="http://offensivepolitics.net/blog/2010/11/visualizing-us-house-results-with-a-seats-votes-curve/">2010 election returns</a> blog post. Quickly, the Seats-Votes is a plot of a smoothed Gaussian kernel density estimator. Please refer to the <a href="http://offensivepolitics.net/blog/2010/11/visualizing-us-house-results-with-a-seats-votes-curve/">Visualizing US House Election Returns</a> post for an annotated example plot and more general information. </p>
<p>We&#8217;re going to create a Precincts-Votes plot of the Chicago Mayoral Democratic primary election returns. We will see a smoothed curve showing an estimation of the number of precincts with returns at a given percentage. These types of curves have been traditionally used in a two party race, meaning 50% is the cutoff for a win. But the Chicago Majoral Primary is a 6-way race, so a simple plurality is all that is required to win. Things are further complicated since the overall winner must receive 50% of the overall vote to avoid a runoff. I&#8217;m going to explore win percentages in this race from a multi-candidate perspective in a future blog post, but still  keep it in mind while looking at these plots. </p>
<p>Updated  R code and plots are available on the <a href="https://github.com/offensivepolitics/chicago-mayor-2011">Chicago Mayor 2011</a> github page. </p>
<h2>Data Preparation</h2>
<p>The election returns from the previous blog post are in a wide format, one line per precinct. To effectively plot these we&#8217;re going to need to convert this to a long format. We&#8217;ll use the <strong>melt function</strong>.</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;"># take the wide df variable and melt it to a LONG format
df.m &lt;- melt(df,&quot;WARD_PRECINCT&quot;, 
   c(&quot;emanuel_pct&quot;,&quot;delvalle_pct&quot;,&quot;braun_pct&quot;,&quot;chico_pct&quot;,&quot;watkins_pct&quot;,&quot;walls_pct&quot;))
head(df.m)</pre></div></div>

<p>Now our data looks like this:</p>
<table>
<tr>
<th>WARD_PRECINCT</th>
<th>variable</th>
<th>value</th>
</tr>
<tr>
<td>1-1</td>
<td>emanuel_pct</td>
<td>46.59</td>
</tr>
<tr>
<td>1-2</td>
<td>emanuel_pct</td>
<td>64.07</td>
</tr>
<tr>
<td>1-3</td>
<td>emanuel_pct</td>
<td>60.12</td>
</tr>
<tr>
<td>1-4</td>
<td>emanuel_pct</td>
<td>59.24</td>
</tr>
<tr>
<td>1-5</td>
<td>emanuel_pct</td>
<td>66.67</td>
</tr>
<tr>
<td>1-6</td>
<td>emanuel_pct</td>
<td>54.51</td>
</tr>
</table>
<h2>Plots</h2>
<h4>The Winner</h4>
<p>First we&#8217;ll make the precincts-votes plot for the winner of the election, Rham Emanuel.</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;">qplot(emanuel_pct,data=df,geom=c(&quot;density&quot;,&quot;rug&quot;),
      main=&quot;Precincts-Votes Curve, Chicago Mayor 2011&quot;,
      xlab=&quot;Vote %&quot;,ylab=&quot;Density&quot;)</pre></div></div>

<div id="attachment_524" class="wp-caption aligncenter" style="width: 630px"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/emanuel_sv.png" alt="" title="Precincts-Votes curve, Emanuel Only, Chicago Mayoral 2011" width="620" height="620" class="size-full wp-image-524" /><p class="wp-caption-text">Precincts-Votes curve, Emanuel Only, Chicago Mayoral 2011</p></div>
<p>This plot is very information-heavy, but can be decoded pretty easily. We see a large bubble of precincts where Mr. Emanuel received between 50 and 65 percent of the vote, and much fewer precincts in the 20-40 percent range, and very few at either extreme. The large spike near 60% implies Mr. Emanuel performed better in many precincts than his 55% overall vote total would lead us to believe. </p>
<h4>All Candidates</h4>
<p>Now  we&#8217;ll create a precincts-votes plot for all candidates combined, but we&#8217;ll leave off the rug plot along the bottom. This will allow us to compare returns for all candidates at once.</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;">qplot(x=value,group=variable,color=variable,data=df.m,geom=&quot;density&quot;,
		main=&quot;Precincts-Votes Curve, Chicago Mayor 2011&quot;,xlab=&quot;Vote %&quot;,ylab=&quot;Density&quot;) + 
		scale_color_brewer(name=&quot;Candidate&quot;) + geom_vline(x=50)</pre></div></div>

<div id="attachment_531" class="wp-caption aligncenter" style="width: 630px"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/all_sv.png" alt="" title="Precincts-Votes curve, All Candidates, Chicago Mayoral 2011" width="620" height="620" class="size-full wp-image-531" /><p class="wp-caption-text">Precincts-Votes curve, All Candidates, Chicago Mayoral 2011</p></div>
<p>This combined precincts-votes chart is not as useful at comparing returns as I would like. Several of the candidates received near zero votes in many precincts, causing the scale to skew heavily towards larger values. We&#8217;ll drop the 3 worst performing candidates and build the chart again: </p>
<h4>Top 3</h4>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;">qplot(x=value,group=variable,color=variable,data=subset(df.m,variable != 'walls_pct' &amp; variable != 'braun_pct' &amp; variable != 'watkins_pct'),
 		geom=&quot;density&quot;,main=&quot;Precincts-Votes Curve, Chicago Mayor 2011&quot;,xlab=&quot;Vote %&quot;,ylab=&quot;Density&quot;) + 
 		scale_color_brewer(name=&quot;Candidate&quot;, labels=c(&quot;Emanuel&quot;, &quot;Del Valle&quot;, &quot;Chico&quot;),
		breaks=c(&quot;emanuel_pct&quot;, &quot;delvalle_pct&quot;,&quot;chico_pct&quot;)) + geom_vline(x=50)</pre></div></div>

<div id="attachment_530" class="wp-caption aligncenter" style="width: 630px"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/emanuel_delvalle_chico_sv.png" alt="" title="Precincts-Votes curve, Emanuel, Del Valle, Chico, Chicago Mayoral 2011" width="620" height="620" class="size-full wp-image-530" /><p class="wp-caption-text">Precincts-Votes curve, Emanuel, Del Valle, Chico, Chicago Mayoral 2011</p></div>
<p>This chart is quite a bit better than the last. Candidates Emanuel, Chico and Del Valle received 55%, 23% and 9% of the overall vote respectively and I think this chart helps us better understand these totals. We see Del Valle underperformed in the vast majority of precincts, but was still competitive in several with returns between 20 and 40 percent. Candidate Chico has a large grouping in the 10% return range, whichis interesting given that he received 23% of the overall vote total. His ten percent wouldn&#8217;t win him a precinct, but it would keep him competitive in the overall total. This is important given that a simple plurality may win a precinct in a 6-way race. We can also see the candidates were all moderately competitive in many of the precincts around the 20-40 return range, which is what you would expect an average return to be for a contested precinct. </p>
<p>I hope this post has shown that precincts-votes curves can still be informative in a multicandidate race, and helped us better understand the makeup of the Chicago Mayoral Democratic primary. In my next blog post I&#8217;ll look at a way to visualize precinct returns in a multi-candidate race and how to measure the overall competitiveness of an election.</p>
<p>Code, data, and output are available on the <a href="https://github.com/offensivepolitics/chicago-mayor-2011">Chicago Mayor 2011</a> github repository.  Comments, questions, and pull requests are greatly appreciated. </p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/2011/02/more-chicago-mayoral-analaysis/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Mapping the 2011 Chicago Mayoral Democratic Primary</title>
		<link>http://offensivepolitics.net/blog/2011/02/mapping-the-2011-chicago-mayoral-democratic-primary/</link>
		<comments>http://offensivepolitics.net/blog/2011/02/mapping-the-2011-chicago-mayoral-democratic-primary/#comments</comments>
		<pubDate>Fri, 25 Feb 2011 18:05:32 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Elections]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=482</guid>
		<description><![CDATA[Mapping the Chicago Democratic Mayoral 2011 primary with Ruby, R, and ggplot2]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>Chicago held a Democratic primary on Feb 22nd, and I wanted to visualize the results of the 6-way Mayoral race. I&#8217;m a huge fan of mapping election results with R, but I was never able to get my favorite graphing library <a target="new" href="http://had.co.nz/ggplot2">ggplot2</a> working correctly with shapefiles. After some light googling I found an excellent <a target="new" href="https://github.com/hadley/ggplot2/wiki/plotting-polygon-shapefiles">ggplot2 wiki page</a> that described the whole process. Thanks to whomever put that page together, it was a really helpful resource. </p>
<p>In this blog post I will detail how I joined election results and maps to create a graphical summary of the mayoral primary election.</p>
<h2>Data Sources</h2>
<h4>Precinct Boundaries</h4>
<p>Chicago is split into 50 wards, and those wards are each split into a number of precincts. The city of Chicago helpfully provides <a href="http://www.cityofchicago.org/city/en/depts/doit/supp_info/gis_data.html">precinct-level boundary shape files</a> on their GIS data page.</p>
<h4>Election Results</h4>
<p>I pulled the precinct results from the <a target="new" href="http://www.chicagoelections.com/election3.asp">Chicago board of elections</a> results page. These data were provided in easily parsed HTML pages, unlike so many other states (Virginia I&#8217;m looking at you). </p>
<h2>Data Preprocessing</h2>
<h4>Shape files</h4>
<p>The precinct shapefiles needed no preprocessing, which is quite surprising. Usually municipality names are off, or data are missing, but not this time. </p>
<h4>Election Results</h4>
<p>I used Ruby to scrape each precincts results and saved them off to a CSV file. The individual files were loaded into <a target="new" href="http://nokogiri.org/">Nokogiri</a> and data was easily extracted with Xpath queries. The Ruby file to perform the preprocessing can be viewed <a target="new" href="https://github.com/offensivepolitics/chicago-mayor-2011/blob/master/scrape.rb">here</a></p>
<h2>Graphing</h2>
<p>I made use of several external libraries when building the maps. The shapefiles were loaded with the <strong>maptools</strong> package, and then converted and plotted with <strong>ggplot2</strong> and colored by <strong>RColorBrewer</strong>. The full R code w/ some inline comments can be found <a target="new" href="https://github.com/offensivepolitics/chicago-mayor-2011/blob/master/code.R">here</a></p>
<h2>Results</h2>
<p>Given the multi-party nature of this election I chose to create a turnout map for each candidate. This will allow us to more easily see the regional strengths of each candidate. Click an image below to view full-size. </p>
<table width="100%" rows=3 cols=2>
<tr>
<td align="center"><div id="attachment_496" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/emanuel.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/emanuel-300x300.png" alt="" title="Emanuel Returns 2011" width="300" height="300" class="size-medium wp-image-496" /></a><p class="wp-caption-text">Rham Emanuel with 55.2% of the overall vote</p></div>
 </td>
<td align="center"><div id="attachment_494" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/chico.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/chico-300x300.png" alt="" title="Chico Returns 2011" width="300" height="300" class="size-medium wp-image-494" /></a><p class="wp-caption-text">Gary Chico with 23.9% of the overall vote</p></div>
 </td>
</tr>
<tr>
<td align="center"><div id="attachment_495" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/delvalle.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/delvalle-300x300.png" alt="" title="Del Valle Returns 2011" width="300" height="300" class="size-medium wp-image-495" /></a><p class="wp-caption-text">Miguel Del Valle with 9.2% of the overall vote</p></div>
 </td>
<td align="center"><div id="attachment_493" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/braun.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/braun-300x300.png" alt="" title="Moseley Braun Returns 2011" width="300" height="300" class="size-medium wp-image-493" /></a><p class="wp-caption-text">Carol Moseley Braun with 8.9% of the overall vote</p></div>
 </td>
</tr>
<tr>
<td align="center"><div id="attachment_498" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/watkins.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/watkins-300x300.png" alt="" title="Watkins Returns 2011" width="300" height="300" class="size-medium wp-image-498" /></a><p class="wp-caption-text">Patricia Van Pelt Watkins with 1.6% of the overall vote</p></div>
 </td>
<td align="center"> <div id="attachment_497" class="wp-caption aligncenter" style="width: 310px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/walls.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2011/02/walls-300x300.png" alt="" title="Walls III Returns 2011" width="300" height="300" class="size-medium wp-image-497" /></a><p class="wp-caption-text">William Walls III with 0.9% of the overall vote</p></div>
</td>
</tr>
</table>
<p>Grab the code from Github: <a target="new" href="https://github.com/offensivepolitics/chicago-mayor-2011">Chicago Mayoral 2011 code</a>. Comments, forks, and pull requests, are greatly appreciated.  </p>
<p>Edit: If you enjoyed this post please check out my <a href="http://offensivepolitics.net/blog/2011/02/more-chicago-mayoral-analaysis/">basic precinct analysis</a> on the Chicago Mayoral election.</p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/2011/02/mapping-the-2011-chicago-mayoral-democratic-primary/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Visualizing US House Results with a Seats-Votes curve</title>
		<link>http://offensivepolitics.net/blog/2010/11/visualizing-us-house-results-with-a-seats-votes-curve/</link>
		<comments>http://offensivepolitics.net/blog/2010/11/visualizing-us-house-results-with-a-seats-votes-curve/#comments</comments>
		<pubDate>Tue, 16 Nov 2010 20:08:21 +0000</pubDate>
		<dc:creator>jjh</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[US House]]></category>

		<guid isPermaLink="false">http://offensivepolitics.net/blog/?p=458</guid>
		<description><![CDATA[A few weeks ago I wrote about ways to compare major-party returns in US House elections. I experimented with several visualizations, none as useful as the seats-votes curve. A traditional seats-votes cure measures average party performance against individual US House results. Our simplified curve uses a density plot to measure major-party (Democratic, in this case) [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago I wrote about ways to <a target="blank" href="http://offensivepolitics.net/blog/?p=357">compare major-party returns in US House elections</a>. I experimented with several visualizations, none as useful as the seats-votes curve. A traditional seats-votes cure measures average party performance against individual US House results. Our simplified curve uses a density plot to measure major-party (Democratic, in this case) support across all seats up for election. The seats-votes curve we use will help measure the following characteristics of the US House for a given election: number of uncontested or weakly contested seats, number safe seats, and the number of close or tossup seats. By comparing plots from different years we can track changes in major party support and electoral attitudes, both of which can have a dramatic effect on future elections and legislative priorities in the US House.</p>
<p>This exercise will explain the different components of seats votes plot, and then look at how Democratic party support has changed from 2002 to 2010. </p>
<h2>Seats Votes Explained.</h2>
<p>Though it may look simple, the modified seats-votes curve is very information-heavy and can be somewhat confusing. A small change in the contours of the curve can convey a lot of information. Please refer to the annotated graph (figure 1), and the items below for instructions on how to read a seats votes-curve.<br />
<div id="attachment_462" class="wp-caption aligncenter" style="width: 682px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/11/2010-f1.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/11/2010-f1.png" alt="" title="Democratic Vote Share US House 2002" width="672" height="671" class="size-full wp-image-462" /></a><p class="wp-caption-text">Figure 1</p></div></p>
<ul>
<li><strong>1) Uncontested Seats</strong> -The magnitude of the curve near 100 and near 0 represents the number of weakly contested or wholly uncontested seats. The curve near the left side of the graph measures how many seats were weakly contested by the Democrats, and the right side measures seats uncontested by the Republicans.
</li>
<li><strong>2 &#038; 3) Base Seats</strong>- The lump of seats to the left of the 50% vote share line represent the seats a Republican won (2), the lump to the right belongs to Democrats (3). These lumps are the respective parties base and considered a probable win. The further from the 50% line shows how large a win it was, and how much safer a seat could be considered. A shift away from the 50% mark from one year to the next implies an electorally stronger base, and an increase in magnitude represents a larger base. A shift away and a smaller magnitude can represent an electorally weaker and smaller base, respectively.</li>
<li><strong>4) Competitive Seats</strong>Seats at or near the 50% mark are considered competitive seats. This portion of the curve may be a trough between the two base seat points, or sometimes a lump on its own. </li>
<li><strong>5) Rug plot</strong>The 1-dimensional rug plot across the bottom of the graph provides a density estimation at a given point in the curve. The more hash marks in a given location mean a higher count for that vote share, which influences the height of the density curve. </li>
</ul>
<h2>Latest Returns</h2>
<p>Now that we can interpret the plot in terms of base, competitive, and safe seats lets look at the modified seats votes plot from the previous article but with the latest election returns.</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;">library(ggplot2)
# load the seats-votes data from offensivepolitics.net
df &lt;- read.csv('http://offensivepolitics.net/data/seats-votes-2010.csv')
df$year &lt;- as.factor(df$year)
png(&quot;2010-f2.png&quot;,width=672,height=671)
qplot(voteshare,data=df, geom=c(&quot;density&quot;, &quot;rug&quot;),
	xlab=&quot;Democratic Vote Share (%)&quot;, ylab=&quot;Density&quot;,
	main=&quot;Democratic Vote Share US House 2002-2010&quot;) + 
	facet_wrap(~year,nrow=3) + 
	geom_vline(xintercept=50,colour=&quot;gray50&quot;)
dev.off()</pre></div></div>

<p>Using the information from annotated reference chart (Figure 1.) to interpret to contours of the seats votes curve we can build a narrative for the mood of the electoral for any given year. Using the latest results chart (Figure 2.) we can expand that narrative across electoral cycles.  </p>
<div id="attachment_463" class="wp-caption aligncenter" style="width: 682px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/11/2010-f2.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/11/2010-f2.png" alt="" title="Democratic Vote Share US House 2002-2010" width="672" height="671" class="size-full wp-image-463" /></a><p class="wp-caption-text">Figure 2</p></div>
<p>For elections leading up to 2010 we see a prominent bimodal curve, with a reasonably stable amount of safe seats for each party. In 2002 and 2004 we see a large number of seats in the 30-40 percent Democratic vote share, which corresponds to a strong Republican majority. In 2006 we see the number of competitive seats increase, and 2008 we see an increase in the number of safe Democratic seats. The changes in 2006 and 2008 track nicely with the Democrats narrowly control of the US House in 2006 and then strongly expanding their majority in 2008. </p>
<p>The 2002-2008 plots show a Republican between 15-20%, a Democratic base between 15-18%, and competitive seats between 9-15% of the total US House. In 2010 the structure of the curve changed dramatically. The Republican base seats were right around 18%, which is in line with previous years. But the Democratic base seats look nothing like previous years. There is still a contingent of the lump of safe Democratic seats, but the count dropped by about half. The rest of the seats shifted towards the 50% line where they merged with the other competitive seats. This plot of the returns lines up nicely with the pre-election narrative of an outsize number of tossup races by <a target="blank" href="http://innovation.cqpolitics.com/atlas/house2010_rr">CQ Politics</a>, <a target="blank"  href="http://elections.nytimes.com/2010/forecasts/house">FiveThirtyEight</a>, <a target="blank"  href="http://www.cookpolitical.com/races/house/chart.php">Cook Political Report</a>, and <a target="blank"  href="http://rothenbergpoliticalreport.com/ratings/house">The Rothenberg Political Report</a>. </p>
<h2>More Precise Numbers</h2>
<p>The smoothed density function used in the seats-votes plot is an estimation, so it is difficult to determine exactly how many seats fall within a given vote range. We could use a histogram, but I like a cummulative distribution function (CDF) plot instead. To create a CDF plot for each years results we&#8217;ll use the built-in R ECDF function and ggplot:</p>

<div class="wp_syntax"><div class="code"><pre class="r" style="font-family:monospace;">png(&quot;2010-f3.png&quot;,width=672,height=671)
cdf &lt;- ddply(df,&quot;year&quot;,function(x) data.frame(share=x$voteshare,cdf=ecdf(x$voteshare)(x$voteshare)*100))
qplot(x=share,y=cdf,data=cdf,geom=&quot;step&quot;,
	 main=&quot;Democratic Vote Share US House 2002-2010 (Cummulative)&quot;,
	 xlab=&quot;Vote Share (%)&quot;,ylab=&quot;Total %&quot;) + 
	facet_wrap(~year,nrow=3)
dev.off()</pre></div></div>

<div id="attachment_464" class="wp-caption aligncenter" style="width: 682px"><a href="http://offensivepolitics.net/blog/wp-content/uploads/2010/11/2010-f3.png"><img src="http://offensivepolitics.net/blog/wp-content/uploads/2010/11/2010-f3.png" alt="" title="Democratic Vote Share US House 2002-2010 (Cummulative)" width="672" height="671" class="size-full wp-image-464" /></a><p class="wp-caption-text">Figure 3</p></div>
<p>Using the CDF plot we can see that Democrats received 50% or less in 60% of the races in 2010, but received 50% or less in only 42% of the races in 2008. In 2010 a full 40% of the total races fell into the competitive category, defined as received between 40% and 60% of the vote. In 2008 and 2006  that number was closer to 12%, an increase of 50% in a single year. </p>
<p>The combination of the seats-votes plot and CDF allow us pretty powerful insights into the current electoral power of each major party in the US House. We have hard numbers and a narrative for the 2010 US House Democratic loss that goes beyond parroting the number of seats lots. We also have some historical perspective on major-party electoral returns, and it will be interesting to see if the 2010 competitive seats remain that way in 2012. </p>
]]></content:encoded>
			<wfw:commentRss>http://offensivepolitics.net/blog/2010/11/visualizing-us-house-results-with-a-seats-votes-curve/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

