Being both African and Nigerian, I must say that the data from the 2014 Malawi Presidential elections gave me a lot of hope in the credibility of African election results. I analyzed the 2014 Presidential elections with Weighted Least Square (WLS) and Binomial Regression. Another regression method available was Ordinary Least Squares (OLS) Regression, however, the ones being used are more advanced than the OLS.
Election results were obtained from the website of the Malawi Electoral Commission. The election featured 12 candidates. A total of 7,470,806 registered votes were counted. Of these registered voters, 5,228,583 valid votes were counted and 56,695 votes were invalidated. In the ed, 5,285,278 votes were cast on election day. The Malawi Presidential elections were analyzed for any evidence of electoral unfairness. In investigating for electoral unfairness given the data, an analysis was carried out to test for any relationship between the invalidated rates and the candidate. WLS and Binomial regression were used to estimate the dependence (if any). A vote is valid if it counts the same regardless of the candidate or party and an invalid vote should be independent of the candidate or party.
The candidates of concern were Dr Lazarus McCarthy Chakwera and Prof. Peter Mutharika. The 2014 Presidential election was won by Prof. Peter Mutharika with a total of 1,904,399 votes while Dr Lazarus McCarthy Chakwera came in second place with 1,455,880 votes.
Apart from being more advanced than OLS, WLS is more efficient in detecting any subtle heteroskedasticity that would have gone undetected by the OLS. The dependent variable, invalidated votes, will be regressed to the independent variable, candidate support. Although there were 12 candidates, the analysis was limited to the two highest candidates as they had the greatest proportion of support in the elections. The data for the regression will be weighted with the total votes cast in each district.
The analysis was done with the R application. Prior to analysis, the dependent variables were transformed using the logit function. This approach was taken to take care of any violations in the assumptions such as the normality distribution. The assumptions of the WLS include: the residuals are from a normal distribution, the residuals have a constant expected value of zero and constant variance.
Two separate scatter plots and regression tables regressed the transformed invalidation rate weighted with the total votes cast and level of support for Lazarus Chakwera and Peter Mutharika producing the following results in both cases: the residuals of the relationship, the regression equation of y = -0.1046x - 4.5246 for Chakwera and of y = 0.2326x - 4.6408 for Mutharika, and a p-value of 0.6635 for Chakwera and 0.3758 for Mutharika. Although these values are representative of the plot, the main value of concern is the p-value. The p-value is important in evaluating how much the sample supports the null hypothesis. In this case, the null hypothesis is that there is no relationship between the dependent and independent variable. The p-value will be compared to the 0.025 alpha level. Usually, the alpha level is at 0.05, however, a transformation of the dependent variable means an introduction of a new model. As such, a Bonferroni adjustment is done to account for more than one model. The Bonferroni adjustment divides the initial alpha level by the total amount of models used. In this case, the number of models used are two. When the p-value is greater than the alpha level, the conclusion is to fail to reject the null hypothesis. If the p-value is less than the alpha level, the vice-versa is the conclusion.
The p-value in this case leads to the conclusion that there is no detected relationship between the independent variable and the dependent variable. However, this conclusion is not enough because in performing this analysis, certain assumptions were made. To appropriately conclude this "interesting" result, the assumptions made for WLS was tested for reasonability.
The first assumption involving the residuals concerned constant variance and expected value. Weighted Least Squares is a particularly important statistical tool as it helps to indicate any form of heteroskedasticity that would have been unaccounted for with other regression techniques, for example, Ordinary Least Squares Regression. To test the assumption, graphical or numerical methods were used. A graphical method included a scatter plot of the residuals against the independent variable (level of candidate support). Additionally, the Breusch-Pagan test of the model was carried out as a numerical test for homoskedastic residuals. This test returned a p-value of 0.1884 for Chakwera and 0.9752 for Mutharika, indicating no evidence of heteroskedasticity. Even though the numerical result lead to a conclusion that there was no detected evidence for non-constant variance, what is more particular about the result is the very high p-values for each case. The p-values are astronomically high, does this mean that we can begin to trust elections in Africa to a certain degree? I leave you to conclude. Furthermore, the plot of the residuals versus the independent variable (shown below) showed a plot in which the data values spread around the constant expected value of 0. Again, I ask the same question after the p-value from the Breusch-Pagan test. What is your conclusion?
Testing for normality is always dreading. It is one assumption in my own little experience that is easily violated. It is important to note that if at least one of the assumptions is violated, then the model will be changed or if models or other analytical tools are exhausted then a final conclusion on the election will be made. To test the residuals are of a normal distribution, both graphical and numerical tools were employed For the numerical tool, I used the Shapiro-Wilk test, which returned a p-value of 0.898 for the Chakwera support model and 0.9093 for the Mutharika support model. The conclusion is that we fail to reject the hypothesis of normality of the residuals. Furthermore, just like the p-value in the Breusch-Pagan analysis, it is extremely high. In addition, the histogram fitted with a bell-curve (shown below) returned a plot that was consistent with the assumption that the residuals are of a normal distribution.
Although the WLS showed tremendous results in the analysis of the 2014 Presidential elections, I still ran a binomial regression. The Binomial regression served more as an additional check of the WLS. It served more to bolster the results of the WLS analysis. Three assumptions are tested in the Binomial Regression. They include; distribution, constant expected value and overdispersion. There is no test for dispersion available. Also, the constant expected value is tested using a scatterplot of the residuals and the independent variable (candidate support).
The overdispersion test involves dividing the difference between the null deviance and the residual deviance by the larger number of degrees of freedom. The overdispersion test returned a value of 8.5 for Lazarus Chakwera and 12.6 for Prof. Peter Mutharika. A value that is greater than 1 in the over-dispersion test signals the data is over-dispersed. Further testing involves a quasibinomial and analyzing the result from the over-dispersion test with the new range given in the quasibinomial. The quasibinomial returned a dispersion parameter of 300.892 for Chakwera and 298.4421 for Mutharika. The overdispersion value is less than this new parameter and we can conclude that the results from the binomial regression suggest the data is consistent with the assumption of it not being over-dispersed.
The results from the Binomial and WLS regression show that with these analysis techniques, we can begin to trust or confer credibility in some elections in Africa. It sisgnifies that the tunnel to fairness and honesty is not too far away.
While Malawi elevates our heart, we will turn to South Africa and Uganda's elections to see if they do the same or return us to square 1. Stay tuned on the next post.