Introduction

The aim of this project was to visualize the change in the number of polling locations in counties across the United States, and to visualize it alongside other information about a county: its demographics, its tendency to vote Democratic or Republican, etc. We hoped to discover if there is any systematic correlation between these variables and the change in the number of polling stations, and if these differ by geographical region.

Methodology

We drew upon three main data sources: the American Community Survey estimates for demographic data, an MIT Election Data Science Lab dataset on county-level presidential election returns, and the national Election Administration and Voting Survey dataset on county-level polling locations and operations.

We made extensive use of various Python libraries. Besides the Census library that allowed us to access the Census API, we also used: Folium for creating maps, Seaborn and Matplotlib for data visualization through charts, Statsmodels for performing regression analysis, and various others for data cleaning, wrangling, and exploratory data analysis, most prominently Pandas and Numpy.

Our process involved three main stages:

1) The collection of data from the aforementioned sources, cleaning them, and aggregating them into a single merged dataset.

2) Initial exploratory analysis - in particular, the use of regressions to identify potentially interesting relationships between variables.

3) Creating the final data visualizations to be presented - specifically, line charts to show changes over time, bar graphs to compare variables across different subsets of the data, and maps to visualize geographic variation in characteristics.

Findings

You can navigate through our findings using the sidebar. We produced:

1) Exploratory charts

2) Regression models

3) A set of maps