One of the purposes of the Center is to extend current statistical tests and apply them to elections. One test that is frequently used is the Benford Test. However, it has been shown using simulation studies to fall short as an appropriate test (Deckert et al. 2011; Forsberg 2014). However, the goal behind the Benford test is very attractive: Given only vote counts, we can test an election for fraud.
Research Associate Naomi Morishita has developed a simple addition to the Benford Test in the hopes of creating a statistical test that has an appropriate Type I Error rate as well as reasonable power. In this article, she introduces her test.
The Benford Test is one of the tests to test for the presence of fraud in an election. Its use is controversial, since it makes the assumption that vote counts in free and fair elections follow a log-uniform distribution, and their leading digits follow a Benford distribution. This is highly suspect.
In past articles, we have used Benford distribution and the chi-square goodness-of-fit test to see if the leading digits follow the Benford distribution (see, for example, Revisiting Iran 2009 and New Zealand 2014). When the p-value was greater than α = 0.05, we we fail to reject the null hypothesis and there is no evidence of electoral fraud. When the p-value is less than α = 0.05, we conclude that there was statistical evidence of fraud.
Note that this evidence of fraud is not proof of fraud. Using a significance level of α = 0.05 indicates we are declaring fraud incorrectly 5% of the time. This assumes the test is appropriate and that the Type I Error rate really is α = 0.05. To reduce our incorrect claims (accusations) of fraud, we often decide to use a much smaller level, such as α = 0.005, at the expense of reducing the power of the test.
However, this significance level only makes sense if the actual Type I Error rate is α. There is strong evidence that the standard Benford Test has an inflated Type I Error rate.
Issues of the Benford Test
The Benford test has fallen into question, not just in the case of Swedish Election that we have discussed, but especially after work by Deckert et al. (2011) and Forsberg 2014. The simulations of the Benford Test were done on "free and fair" elections, and it detected electoral fraud at a rate much higher than the stated α level.
As such, we now begin to examine a replacement for the Benford test.
The M²-1 Test
Since some well-known infinite integer sequences provably satisfy Benford’s Law exactly and powers of almost any number is one of them, running the leading digit and second digit tests on the squared vote counts and then comparing them to the Benford distribution using the chi-square goodness-of-fit test might be an improvement over the Benford test.
We shall call this test “M²-1 Test” for the leading digit test and “M²-2 Test” for the second digit test.
Steps to Apply
The procedure of the M²-1 Test is rather simple: square the vote counts of the candidate and then apply the Benford test. We compare the distribution of the leading digit of the squared vote counts to the Benford distribution using the chi-square goodness-of-fit test.
We use the second digit instead of the leading digit for M²-2 Test.
Type I Error Rate
The main drawback of the current Benford Test is that the Type I Error rate is greater than the stated α level. The first step in checking that the M² test is better than Benford is to check its Type I Error rate. If the Type I Error rate is closer to α than the Benford Test, then the M² test is superior.
Monte Carlo Simulation
As we are testing the Type I Error rate, we need to generate vote counts from "free and fair" elections. This is more difficult than it seems. As Deckert et al. 2011 and Forsberg 2014 both note, there is currently no known distribution of free and fair elections. As such, we have to either use elections we believe are free and fair or generate elections from our understanding of elections.
This Monte Carlo simulation is of the latter type; it generates the elections. The following describes the multi-step process we used to generate elections.
First, let us assume m is the average division size in the State. With that, the following steps generate the votes in favor of the candidate.
First, if D is the division size and m is the average division size, then
Now that the division sizes exist, the turnout is
Next, the support level for the candidate is
Finally, the vote count for the candidate is
We then take the mean of the p-values that are less than α = 0.05. The mean of the p-values that are less than α = 0.05 should be close to α = 0.05 because α being 0.05 means that there are 5% chance of committing the Type I error.
We then use the Binomial test to determine if the rejection rate is sufficiently close to α = 0.05. If the p-value of the Binomial test is less than α = 0.05, then we have evidence that the Type I error rate is not α. A look at the observed rejection rate tells us if the M²-1 test rejects too frequently or too rarely.
After creating 10,000 simulated elections, we applied the M² tests, recorded the p-values, and calculated the proportion of p-values that were less than our nominal α = 0.05. This proportion is the observed Type I error rate.
When we run the Binomial test for the leading digits of the squared vote counts, we get the p-value of approximately 0 ( less than 2.2×10-16), which means that the M²-1 test does not reject at the correct rate. In fact, the observed rejection rate is approximately 17.94% — too high.
As a comparison, the Type I error rate for the Benford leading digit test was 1.
As for the second-digit test, the M²-2 test provides an observed Type I error rate of 4.71%. (The Binomial test returned a p-value of 0.1909.) Similarly, the Benford second-digit test has an observed Type II error rate of 7.69%.
Here, we introduced the M² tests. We created them to be improvements over the current Benford Test, which tests the digits of elections for evidence of fraud. Because powers of numbers are supposed to follow the Benford distributions, we came to the idea of squaring the vote counts.
Using a Monte Carlo simulation experiment described above, we concluded that the M² tests represented improvements over the original Benford tests. The Type I rejection rates are both closer to our nominal α=0.05.
All is not good, however. While the Type I error rate of the M²-2 was sufficiently close to 0.05, that of the M²-1 test was not. As such, while this test is am improvement over Benford, it is not a good test at this point. In the future, we will be examining whether increasing the exponent helps with the rejection rate. Preliminary results suggest something unexpected.