This article serves as a supplement to my presentation at the 2017 Joint Statistical Meeting (JSM) in Baltimore. It takes a look back at the 2009 presidential election in Iran. At the time, there were many claims of fraud. There were few bits of evidence, however. In some ways, that election was the moment that Electoral Forensics (and my research agenda) began. Here, we apply some newer methods to see if there really was evidence of fraud/unfairness in that election.

And so, was there evidence of electoral skullduggery?

In 2009, incumbent president Mahmoud Ahmadinejad stood for re-election. As the leader of the Alliance of Builders of Islamic Iran (ABII) bloc, he was the leading voice of the conservatives in Iran. His main opponent, Mir-Hossein Mousavi, the leader of the Council for Coordinating the Reforms Front (CCRF) group, represented the reformists in Iranian politics, including much of the youth seeking a better life.

Recall that the conservatives essentially represent the 1979 Islamic revolution and its preservation. The supporters of the reformists have largely been born after that auspicious occasion.

## The 2005 Election

In the 2005 presidential election, Ahmadinejad received 62% of the votes cast. He won the election against reformist Akbar Hashemi Rafsanjani. The map below suggests that the support level for Ahmadinejad was relatively consistent throughout the country. He won all provinces except for Sistan and Baluchestan province in the south east. [Note that Baluchis tend to be Sunni, while Iran tends to be Shi'a. Also the entire province is less Persian than the rest of Iran.]

In the 2009 election, Ahmadinejad received 63% of the vote. However, his support in the second election was much more parochial (see Figure below). He lost the north west and south east. He also had lower support in Tehran, marked on the map.

It is this last point that may explain the persistent claim that the 2009 election was fraudulent in favor of Ahmedinejad. Tehran, being the most densely populous part of the country saw less support for him in 2009. Thus, selection bias reinforced the media claims of fraud.

The following map shows the difference in support for Ahmedinejad between the 2005 and 2009 elections. Note that he gained in places and lost in places. He tended to gain in conservative strongholds (rural and religious areas) and lose in the liberal regions (urban areas).

## The Generalized Benford Distribution

In lieu of using the Benford test, which has been shown to be substandard for testing elections (Deckert et al. 2011), I start with the generalized Benford test (Forsberg 2014). This test generalizes the Benford distribution to include an additional parameter for the maximum possible vote in the division, the turnout.

That each division has a different turnout introduces a problem not seen in the naïve Benford test, where all division have the same size. Here, we need to perform some sort of digit test where the digit probabilities differ for each of the divisions. This is not an easy task.

I proceed as follows: For each division, generate a million elections based on the probability distribution specified by the generalized Benford distribution. For each election, calculate a likelihood, which is just the product of observing that digit in each division. This will give an empirical distribution for that likelihood under the null hypothesis that the generalized Benford distribution correctly describes the leading digits.

Here is that distribution.

For the 2009 election, the observed log-likelihood for the Ahmedinejad vote counts was -58.85, which is well within the usual 95% confidence interval from -66.27 to -51.55. Thus, if the generalized Benford distribution *does* describe the "natural" distribution for leading digits, as Mebane (2010) suggests, then this test does not offer evidence of unfairness or fraud in this election.

## Differential Invalidation

A second test, one that is based more solidly on statistics and the definition of "free and fair" is the test for differential invalidation. DI is a phenomenon whereby certain subsets of the population have a different probability of having their ballots declared invalid by the electoral authority.

There are many *legitimate* reasons for having a ballot declared invalid. One would be that it was filled out incorrectly. There are also illegitimate reasons for a ballot to be declared invalid, such as it was cast for the "wrong" candidate.

If ballots are invalidated for legitimate reasons, reasons uncorrelated with the candidate it was cast for, then the relationship between the invalidation rate and candidate support rate at the division level will be null. On the other hand, if ballots are being invalidated because of who they are for, then this will be reflected in a relationship between the invalidation rate and the candidate support rate at the division level.

Thus, a rather interesting test is to look for a statistically significant relationship between the two numeric variables, invalidations and support. As the invalidations are akin to a Binomially distribution random variable, using generalized linear models and fitting using maximum quasi-likelihood estimation is appropriate.

The following graphic is a plot of the invalidation rate against the support level for Ahmedinejad in the 2009 election. The curve is a locally weighted least squares (loess) curve. The dashed curves constitute estimated 95% confidence interval for the curve.

If the invalidation rate and candidate support level are independent, we would expect the loess curve to be horizontal, or at least the 95% confidence band allowing for a horizontal line.

That a horizontal line does not fit within the 95% confidence band is evidence that there was differential invalidation. Furthermore, the regression curve (logit link for a Binomial dependent variable using quasi-likelihood) indicates strong evidence of differential invalidation ($p \ll 0.0001$).

## Conclusion

In this article, we looked at the 2009 presidential election in Iran. Compared to the 2005 election, Ahmedinejad did better in terms of proportion of the full vote, but worse in terms of vote concentration. The country was more divided on him in 2009 than in 2005.

That he did poorer in the urban (and liberal) areas offers an explanation behind the massive protests that followed voting in 2009. However, the lack of social mixing renders that (as evidence of fraud) invalid.

The generalized Benford test did not detect either unfairness or fraud in the reported vote counts for Ahmedinejad.

However, testing for differential invalidation did detect issues. The effect of the support for Ahmedinejad on the invalidation rate was highly statistically significant. This means there is evidence that votes were invalidated based on whom they were cast for.

This is a clear violation of the claim that the 2009 presidential election in Iran was free and fair.

Thank you for reading this article. Please visit the other articles we have written through the years, examining elections from all corners of this planet.

For those interested, here are some links to the materials discussed in this article:

Data | province level | |

shahrestan level | ||

R Code for | generalized Benford test | |

differential invalidation test | ||

Additional Conference Items | Poster (657kB) |