How can you be sure that the voting machines in southeast Kansas were rigged?

How can I be so sure? Couldn’t there be some other cause of the bias?  That was the most common inquiry at my presentation Saturday, when I explained my exit poll results to the people who helped collect the data and had a vested interest in understanding the results.  I may have come across as a bit defensive in regard to this question.  I’m sorry if I did.  It’s hard to articulate the depth of my certainty, but I’ll try.

I carefully set up these exit polls to compare the official vote count by machine type.  The only legitimate concern regarding the meaning of these results is a biased sample. Not everybody tells the truth.  Some people delight in giving false answers to surveys.  How are you going to account for that? It’s a fair concern.

While I cannot prove that didn’t happen (at least, not without access to the ballots, which isn’t permitted), this is part of the normal error I expect.  It always helps to state assumptions explicitly.

INTROVERTS, LIARS, AND IDIOTS ASSUMPTION : THESE TRAITS ARE RANDOMLY DISTRIBUTED AMONG ALL CANDIDATES AND POLITICAL PARTIES.  I am assuming that that people who were less likely to participate (introverts) or more likely to fudge their answer (liars) or make mistakes (idiots) in filling out the survey did not differ in their response to our exit poll.

I received the following email that sums up this concern nicely and also suggests a couple of ways to check that hypothesis.

Hi Beth,

The observed discrepancies between official results and your poll results very clearly show that Clinton (D) voters were more strongly represented in those polled than in the official vote count; Trump (R) voters were less well represented.  There are  many possible explanations for this discrepancy.  One hypothesis is that a certain percentage of voters “held their nose and voted for X”  and would never have participated in the poll.  If these voters tended to be more of one party than the other, than that party would be less represented in the polls.   

Fortunately, your data provide a means to test this hypothesis about the “missing minority”, for it leads to this prediction:  
If a “missing minority” was biased towards X, then sites at which X had a greater percentage of the votes would be least affected by vote disparities.

A corollary prediction:  sites having the highest response rate would be least affected by vote disparities.

Have at it!
Annie

The main reason I find this hypothesis implausible is that the discrepancies for the Supreme Court judges were twice as large and followed the same pattern as the Pres. race discrepancies. There’s no reason to think more people ‘held their nose’ for judges than president!

Regarding those two predictions:

  1.  The sites with the greatest discrepancies were machine counts for SE Wichita, Urban  Wichita and Cowley.  The sites with the highest %Trump voters were Cowley, SW Wichita and Sumner.  No correlation there.
  2. The site with the lowest response rate, Sumner with 25%, also had the lowest discrepancies between the exit poll and the official results for the Pres. race.

In short, we do not see the other data relationships we would expect if the introverts liars and idiots assumption were false.  There is no reason to assume these individuals were more likely to vote for one candidate than another resulting in the bias in our data.

Analyzing Exit Poll Results

It’s important to state up front what data will be collected, what analysis to perform on that data and what constitutes evidence of a serious problem versus random errors in any poll of this nature.

Polling stations allow voters access to the results after the polls are closed when the votes have been tallied.  In Sedgwick County, they will have separate reports printed for the electronic voting machines and the scanned paper ballots.

We will also get a count of the number of provisional ballots collected from the polling location.  Theseballots will not be opened until the voter’s registration is verified and there will never be an official tally of the provision votes for polling location.  But we can look at the results we have for voters who submitted provisional ballots and compare them with the votes that were counted at the polling location.  If there are significant differences, this is evidence of the voter suppression effect of Kris Kobachs voter registration rules.

I have created a general data collection and analysis EXCEL spreadsheet.  Multiple precincts vote at each polling location and the results are reported for each precinct, not the polling location, so I’ve set up a spreadsheet to sum the numbers up and compute the appropriate probabilities.

I will be customizing this EXCEL file for each exit poll location in Kansas, but I am happy to share a general version of this worksheet with anyone who is interested in running an exit poll for their own area.  All you will have to do is input the official results and your exit poll results.    This is an example of of the output.

 

Example Data Analysis
Presidential race Chi-Squared Result:  NA
Candidates Exit Poll Official Results Binomial Probability
Clinton (D) 52 60 0.0638
Trump (R) 38 30 0.0530
Johnson (L) 6 6 0.5593
Stein (G) 2 2 0.5967
Other 2 2 0.5967
Total 100 100

There are two different analyses than can be used in this situation.  The chi-square test will give an exact probability that the actual results differed from what would be expected under the assumption of random chance.  EXCEL has this test as a built in function: CHISQ.TEST.  But the chi-squared test has minimum data requirements which were not met in this example, hence “NA” or Not Applicable as the result of this test.

Since the chi-squared test will not work for every set of possible data, I also show the individual binomial probabilities for each of the candidate.  The minimum probability from this set of five computations is a reasonable approximation to the exact computation using the extension of the binomial distribution and can be easily computed using built-in Excel formula BINOM.DIST.

How to interpret this:

We judge the probability of machine manipulation of the vote by evaluating the probability of our results assuming no manipulation of votes is occurring.  This is referred to as the “null hypothesis”.  All probabilities shown are made under this assumption.  If this probability is above 0.05 (5%), we can reasonably conclude that the differences between the machine vote share and the exit poll vote share are typical of random variation due to the normal errors in the process.

If this value lies between 0.05 and 0.001, raise an eyebrow and give the numbers for that race a little extra scrutiny and consider it in concert with the other exit polling results.

If this values lies below 0.001, that is evidence of fraud.  Personally, I would like to see a recount of any race with results that fall this far from normal.  But only a candidate can request a recount in Kansas.

In this example, I have contrived to show Trump with a questionably low # of votes in the official count compared to the exit poll results.  Hillary has a slightly elevated value.  But these results are not unexpected as the minimum probability of results this far off is above 0.05.

But if the other sites have similar values and they are all benefitting the same candidate, it would be concerning.  If 2 or 3 sites out of 5 show the same beneficiary of the differences, that’s reasonable.  But if 5 out of 5 sites show the same beneficiary, it’s evidence of rigging.

If we see multiple races with low odds and the same slate of candidates are benefiting, we have solid evidence of machine manipulation of our official votes.  If we see only the normal expected errors, then we have solid evidence it is NOT being manipulated.

While a single location and a single race might show evidence of manipulation, savvy cheaters will try to avoid this method of detection by establishing a maximum shift that falls beneath the 0.05 probability results.  But looking at multiple races and sites, we can establish whether even small shifts show evidence of cheating.

We can define a slate of candidates by party and check the probability of getting the results we got using a similar binomial analysis.  Under the null hypothesis of no manipulation, the probability of an error that benefits a candidate is 50%.   There are three races with candidates and five judges we are asking about, for a total of 8 results for each polling location.  Governor Brownback would like to see 4 of the 5 judges lose their jobs and replace them.  We can also presume he supports the Republican Party candidates for President, Senate and Representative.

We will have data from 5 different locations for a total of 40 random samples with approximately 50% probability.  (For example, let X be the number of errors that were the opposite of the Brownback administration preferred candidates. If we have 40 random samples as defined above, the probability of getting errors in the opposite direction of his preferred result is computed with the following excel formula:  BINOMDIST(X, 40, .5, 1)

If this value is extremely low (less than .001), we conclude that the Republican Party has unduly benefited and further investigation would be appropriate.

How to interpret the Provisional Ballot Data:

We cannot know the final count of the provisional ballots collected at a polling location.  They are polled at the county level and only those that are shown to be registered voters are opened and counted.   What we can do is compare the results of the provisional ballots with the other responses to our exit poll.   If there is a major difference between those asked to fill out provisional ballots with the automatically counted votes, we have a measure of the effect of the voter ID laws and if it made a difference to the outcome.

For each race, we can use the chi-square test if we have sufficient data.   Otherwise, we can use the binomial approximation similar to the one used to compare the official count to the exit poll survey results.

electioneering and instructs them regarding what they can and cannot do.  While permission is not required to run an exit poll, we do need permission from the property owner to set up a booth to collect our ballots and provide chairs and shade for our volunteers.  Mainly, we want everyone to know what we are doing to avoid any issues arising on on election day.

How to Run an Exit Poll Part 1

How to Run an Exit Poll Part 2

Creating an Exit Poll Ballot

Creating an Exit Poll Ballot

This is part 3 of my “How-to-Run-an-Exit-Poll-Series.

The exit poll survey ballot is important, but not complicated. The only question of interest, other than their ballot choices, is the method of voting.   Data will be available at the end of the day with separate totals for the machine cast votes and the scanned paper votes.  There will be no official count of provisional votes at this station, so we can only compare those votes to overall total for the polling station. But that comparison allows us to evaluate whether the giving people provisional votes amounts to a voter suppression tactic.

Since space on our survey form is at a premium, and because including that information makes their response less anonymous, I do not recommend including questions about age, race or gender.  Generally speaking, you want to keep the words to minimum.  (Not an easy task for me.)

Here is an example survey I have developed for exit polls in Sedgwick Co.  I included a short paragraph at the top because I feel it’s important to let people know why you want this information and reassure them that results are anonymous, just like their vote.

Sample Exit Poll Survey

The first question is really too long, but I wanted to be as clear as I can about this question.  In Sedgwick County Kansas there are three possible options:  A vote cast via electronic voting machine, a paper ballot that the voter feeds into a scanner for on-site electronic counting or a provisional ballot – a paper ballot that is sealed into an envelop to be counted later (maybe).

Asking about the specific races is straightforward.  State the office and then list the candidates.  Circling answers reduces the need for a blank or box to check.  It saves space on the page.

Staggering the answers for questions with more than one line of answers (ex: Pres) makes it easier to discern the voters intent.  When they are stacked one above another, the answer may easily become ambiguous.

Since a single polling location will have multiple precincts voting there, it’s a problem asking about races where different precincts will be voting  for different candidates.   Generally, I want to confine the questions to races that will appear on every ballot at the polling location.   On the other hand, my site managers for the SW Wichita location are very interested in the county commissioner races.  We arrived at the following:

Who did you vote for your County Commissioner Race? (Select one for District 2 OR  District 3)   –  sw-wichita-nov-8-exit-poll-ballot

I have hopes that we won’t get too many voters identifying their choices for both district 2 and district 3, but I expect we will get some.   OTOH, it’s the only question that would be spoiled and I’m reasonably comfortable in assuming that such mishaps are equally likely to occur regardless of which candidate they support.  I think we will get good data from this exit poll.

How to Run an Exit Poll Part 1

How to Run an Exit Poll Part 2

 

 

A Replication of My Work.

Mr. Brian Amos, Ph.D. candidate at the University of Florida was dedicated enough to replicate some of my work and acknowledge that he gets the same results I reported.

He does have a few disagreements with my approach. For example, what he describes as a nitpick, I would respond with: That’s a feature, not a bug! My choice of limiting an analysis to the precincts with more than 500 votes cast results in what he considers an overemphasis on the effect I’m am concerned with. This is absolutely true. That particular analysis was designed to draw out that effect and make it more apparent. The vote share data is very noisy and impacted by many different factors. The trend is real, but is easily missed in the inherent noise of the larger dataset.

Wichita 2014 Election Results
Wichita 2014 Election Results

Mr. Ames wonders if some other, correlated factor such as the voter registration numbers, would display a similar trend in the cumulative chart. He shows this is true for the share of Republicans in this particular data set. But this is not a universally correlated trait across the different states where such trends have been found, and it was not enough in Sedgwick County Kansas to account for the difference in vote share.

I discuss this factor at more length in my recently published paper “Audits of Paper Records to Verify Electronic Voting Machine Tabulated Results” in the Summer 2016 issue of The Kansas Journal of Law and Public Policy. The graph displayed above is from that paper, illustrating that although there is an upswing the cumulative graph for share of Republicans, it is much smaller than the upward surge of the vote share for various republican candidates in 2014.

His parting comment “While the charts may be explainable through vote fraud, there are other, perfectly innocuous explanations that can be put forward, as well.” is true. Yes, there are other possible and innocuous explanations. Statistical analysis only illuminates correlations and other relationships. Further investigation is needed to determine cause. Just because the trend is a predicted sign of election fraud does not mean election fraud occurred.

The only way to tell if our machine tabulated vote count is accurate or undermined is to conduct a proper audit. That’s never been done here in Sedgwick County. I’ve requested access to do this as a voter and been denied. I filed the proper paperwork in a timely manner asking for a recount of those records after the 2014 election and was denied. I’ve sued for access as an academic researcher and been denied.

Why should I trust a vote count that our officials will not allow to be publicly verified? Why should anyone?

Another Analysis of 2016 Democratic Primary

This is a solid analysis. I say this without having vetted their data collection, I’m assuming they did that part right. If so, the conclusion is obvious. They authors confine all analysis to the appendix, so you can read the paper without having to understand any math.

Are we witnessing a dishonest election?

They found Sanders won 51% to 49% in places that had a paper trail. They found Clinton wins 65% to 35% in places that don’t. That’s amazing! Yes, those are different states. Yes, they looked at a different possible causes They tested for that difference while accounting for the % whites and the ‘blueness’ of the state. No, they didn’t find anything sufficient to explain that difference.

You don’t have to be a statistician to understand that’s a huge difference in proportion. It helps to be a statistician to understand the tests they ran checking other explanations and the resulting output. They are running appropriate tests and the output is unequivocal. Which they stated. I concur.

“As such, as a whole, these data suggest that election fraud is occurring in the 2016 Democratic Party Presidential Primary election. This fraud has overwhelmingly benefited Secretary Clinton at the expense of Senator Sanders.”

Redacted tonight makes this article their lead story.

BTW, I absolutely loved their fake commercial for “Shut your f***ing tweethole” at the 15 min mark.

Authors response to criticisms

My work, some of my graphs and my previous post, are included in the appendix of the response article. Lots of interesting graphs there too.