In the previous entry, we introduced the M² tests, which we squared the vote counts and then compare their distribution to the Benford distribution in order to detect electoral fraud, while the Benford test uses the original (not squared) vote counts and had some issues of committing the Type I error.
While we focused on the M² tests (squared vote counts) in the previous entry, we are going to explore the other exponents of the vote counts and their possibilities of improvement over the Benford test.
Until the previous entry, we have set our α = 0.05 and said that there are 5% chance that the test commits the Type I error. This means that the test wrongly detects the electoral fraud 5% of the time. Type I error and Type II error rates are inversely related. They are not linearly inverse but the Type II error rate goes up as the Type I error rate goes down.
We have decided that we would rather let the criminals go instead of calling innocent people liars; we rather keep the Type I error rate low and let the Type II error rate be high.
That being said, we have decided to set α = 0.005 instead of 0.05 from now on, which corresponds to a 0.5% chance that the test commits the Type I error of wrongly crying fraud.
Previously, we introduced the M² tests and saw a marked improvement over the standard Benford test. The M² tests, however, had an inflated Type I error rate for the leading digit test. While it was superior to the standard Benford test, this Type I error rate is concerning. Since squaring helped, we thought that cubing would help even more. Thus, we designed the M³ tests.
Steps to Apply
We cube the vote counts of the candidate and then apply the Benford test. We compare the distribution of the leading digit of the squared vote counts to the Benford distribution using the chi-square goodness-of-fit test.
We use the second digit instead of the leading digit for M3-2 Test.
Monte Carlo Simulation
As we are testing the Type I Error rate, we need to generate vote counts from "free and fair" elections. As discussed previously, such a distribution does not currently exist. However, we generated election counts using this multi-step process.
Given that m is the average division size in the state, the following steps generate the votes in favor of the candidate.
First, if D is the division size, then
Now that the division sizes exist, the turnout is
Next, the support level for the candidate is
Finally, the vote count for the candidate is
We then calculate the proportion of the p-values that are less than α = 0.005 as our estimate of the Type I error rate. Then, we use the Binomial test to determine if the observed Type I error rate is sufficiently close to the nominal α = 0.005 rate.
After creating 10,000 simulated elections, we applied the M² tests, recorded the p-values, and calculated the proportion of p-values that were less than our nominal α = 0.005. This proportion is the observed Type I error rate.
When we run the Binomial test for the leading digits of the cubed vote counts, we get the p-value of 0.002857, which means that this test does not reject at the 0.005 level. The estimated Type I error rate is approximately 0.72%, which is slightly higher than 0.5%.
For the second digits of the squared vote counts gives an observed Type I error rate of 0.49%, which is very close to 0.5%. Thus, this test appears to work well for the second-digit test, but note well for the first.
Higher-Order M tests
As the M³-1 test had an inflated Type I error rate, we thought a fourth power may fix the issue. The observed Type I error rate was approximately 0.56%, which is very close to the nominal α = 0.5%. The second-digit version of the M4 also had a Type I error rate very close to the target: 0.53%.
Here, we went beyond with the M² tests by replacing the exponent with 3 and 4. In order to accurately detect — and not detect — electoral fraud, we wanted to make sure that our test has the correct Type I error rate.
For the leading-digit tests, M³-1 and M4-1, we conclude that using the M³-1 test was inferior to the M4 test, as the Type I error rate was significantly higher than our nominal α = 0.005. However, for the second-digit tests, all exponents resulted in Type I error rates close to nominal.
As the power of the test may decrease with increasing exponents, we would like to select the lowest exponent that works. For the first-digit test, it appears that exponent is 4; for the second-digit tests, 2.
With this said, we are not yet comfortable using these modifications as appropriate tests for electoral fraud. For that to happen, we would like to generate elections under a different distribution, perhaps generating it from real elections. That, however, will have to wait for a future article.