Election night promises to be interesting… Ballots will get counted; networks will start calling states; and electoral votes will be slowly allocated to candidates.

But we will probably know the winner way before any candidate reaches 270 electoral votes.

Why? Vote intentions are correlated across states. Results from just a handful of states should help us predict better who will win the remaining states. For example, if Hillary Clinton loses or underperforms the model’s forecast in New Hampshire, this might indicate that she will also underperform in other swing states (maybe because there were a lot hidden Trump voters that polls missed in NH… and may equally have missed in PA, OH, FL, etc.). In other words, early results will not just matter as they directly affect the electoral votes tally, they should also help us refine our forecast in other states.

Enter `update_prob()`

I wrote a very simple function in R, update_prob(), that performs this calculation. It takes as input a set of events that you want to condition on (e.g. Clinton or Trump already won a set of states; or the two-way final score in a given state falls within a certain interval), and it displays the updated forecast by drawing from the conditional distribution.

You can use it on election night, or before if you want to check how the forecast would change under various scenarios.

Here are just a few examples, using hypothetical and somewhat far-fetched cases:

If we call update_prob() without adding any new information, the function returns the baseline prediction from the model:

update_prob()

## Pr(Clinton wins the electoral college) = 90%
## [nsim = 100000; se = 0.1%]

Now imagine that New Hampshire gets called for Clinton; her chance of winning the election naturally increases:

update_prob(clinton_states = c("NH"))

## Pr(Clinton wins the electoral college) = 93%
## [nsim = 95626; se = 0.1%]

What happens if Trump wins Pennsylvania? (An unlikely outcome here, since the model sees Pennsylvania as more favorable to Clinton than New Hampshire). Things start looking bad for Clinton:

update_prob(clinton_states = c("NH"), 
            trump_states   = c("PA"))

## Pr(Clinton wins the electoral college) = 25%
## [nsim = 4071; se = 0.7%]

If Ohio falls in the Trump column as well, we don’t actually get much new information (Ohio is more pro-Trump than Pennsylvania anyway):

update_prob(clinton_states = c("NH"), 
            trump_states   = c("PA", "OH"))

## Pr(Clinton wins the electoral college) = 22%
## [nsim = 3949; se = 0.7%]

It is also possible to plug in score intervals: for example: we don’t know who is winning Florida for sure, but it looks like Clinton and Trump are close, with perhaps a slight lead for Clinton… Let’s put the final score for Clinton somewhere between 49 and 52% in that state (taking her score in a two-way race, ignoring 3rd party candidates):

update_prob(clinton_states = c("NH"), 
            trump_states   = c("OH", "PA"), 
            clinton_scores_list = list("FL" = c(49, 52)))

## Pr(Clinton wins the electoral college) = 37%
## [nsim = 2330; se = 1%]

We can add multiple score conditions: let’s say Nevada is in the Clinton column and her score there is somewhere between 51 and 54%.

update_prob(clinton_states = c("NH"), 
            trump_states   = c("OH", "PA"), 
            clinton_scores_list = list("FL" = c(49, 52), 
                                       "NV" = c(51, 54)))

## Pr(Clinton wins the electoral college) = 74%
## [nsim = 1113; se = 1.3%]

We can also get the updated probabilities of victory by state.

update_prob(clinton_states = c("NH"), 
            trump_states   = c("OH", "PA"), 
            clinton_scores_list = list("FL" = c(49, 52), 
                                       "NV" = c(51, 54)),
            show_all_states = TRUE)

## Pr(Clinton wins) by state, in %:
##      AK AL AR AZ  CA CO  CT  DE FL GA  HI IA ID  IL IN KS KY LA  MA  MD
## [1,]  0  0  0  0 100 95 100 100 61  0 100  0  0 100  0  0  0  0 100 100
##       ME MI  MN MO MS MT NC ND NE  NH  NJ NM  NV  NY OH OK  OR PA  RI SC
## [1,] 100 89 100  0  0  0 50  0  0 100 100 99 100 100  0  0 100  0 100  0
##      SD TN TX UT VA  VT  WA WI WV WY ME1 ME2  DC NE1 NE2 NE3
## [1,]  0  0  0  0 99 100 100 98  0  0 100  61 100   0   1   0
## --------
## Pr(Clinton wins the electoral college) = 73%
## [nsim = 1062; se = 1.4%]
## --------

How to use it

To use update_prob(), you only need 2 files, available from my GitHub repository:

update_prob.R, which loads the data and defines the update_prob() function,
last_sim.RData, which contains 4,000 simulations from the posterior distribution of the last model update.

Put the files into your working directory or use setwd().

If you don’t already have the mvtnorm package installed, you can do it by typing: install.packages("mvtnorm") in the console.

To create the functions and start playing, type: source("update_prob.R"), and the update_prob() function will be in your global environment.

The function accepts the following arguments:

clinton_states: a character vector of states already called for Clinton;
trump_states: a character vector of states already called for Trump;
clinton_scores_list: a list of elements named with 2-letter state abbreviations; each element should be a numeric vector of length 2 containing the lower and upper bound of the interval in which Clinton share of the Clinton + Trump score is expected to fall.
target_sim: an integer indicating the minimum number of samples that should be drawn from the conditional distribution (set to 1000 by default).
show_all_states: a logical value indicating whether to output the state by state expected win probabilities (set to FALSE by default).

How it works

The function draws from a multivariate normal distribution of vote intentions (with the same mean and covariance as the simulations from the last model update) and rejects draws that do not match the constraints expressed in the clinton_states, trump_states, and clinton_scores_list arguments. Accepted draws (i.e. simulations that match the constraints) are recorded. The function keeps drawing from the multivariate normal distribution (in batches of 100,000 draws) until target_sim draws have been accepted.

It then calculates the electoral votes distribution and returns the proportion of cases in which Clinton wins the electoral college, along with the standard error of this proportion.

This is a very simple and arguably dumb sampling algorithm. If your constraints are too tight, or your scenario is too far-fetched, the rejection rate will be extremely high. Stan would be much more efficient as the sampling algorithm would learn which regions of the parameter space to focus on; but this seems to work well enough for this application – unless you want to explore some really complicated and unlikely combinations of scenarios or with extremely tight constraints.

If the rejection rate is too high and sampling looks like it would take too long, the function stops and issues a message inviting you to check if your constraints make sense and to consider relaxing them a bit.

Updating the Forecast on Election Night with R

Pierre-Antoine Kremp

Enter `update_prob()`

How to use it

How it works

Updating the Forecast on Election Night with R

Pierre-Antoine Kremp

Enter update_prob()

How to use it

How it works

Enter `update_prob()`