Based on Benjamin F. Jones and Benjamin A. Olken. 2009. Hit or Miss? The Effect of Assassinations on Institutions and War. American Economic Journal: Macroeconomics, 1(2): 55–87.
All materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).
Let’s go back to the leaders.csv data we have considered in previous seminars. Remember that there is a longstanding debate in the study of international relations on whether individual political leaders make a difference. To explore this issue, we’ll explore the effect of the death of the leader on the level of democracy of a country. For this purpose, we will analyse data on assassinations and assassination attempts against political leaders from 1875 to 2004.
Whether an assassination attempt occurs or not is not a random process. However, once an assassination attempt has occurred, one could argue that whether the assassination attempt is successful or not is the result of small elements of randomness, such as the timing and path of the weapon. As a result, we can consider (at least for now) that, after an assassination attempt, the death a leader is close to random and, thus, the assassination attempts where the leader ended up dying should be, on average, comparable to the assassination attempts where the leader ended up surviving. If this is true, then we can estimate the average causal effect of the death of the leader by computing the difference-in-means estimator.
To measure the level of democracy of the country, we will use polity scores. Polity scores categorize the regime of a country on a 21-point scale ranging from -10 (hereditary monarchy) to +10 (consolidated democracy). The Polity Project has produced polity scores for all countries from 1800 and on. For example, here are the 2018 polity scores.
The dataset is stored in a file called “leaders.csv”. Table 1 shows the names and descriptions of the variables in this dataset, where the unit of observation is assassination attempts.
Table 1: Variables in “leaders.csv”
Variable | Description |
---|---|
year | year of the assassination attempt |
country | country name |
leadername | name of the leader |
died | whether leader died: 1=yes, 0=no |
politybefore | polity scores before the assassination attempt (in points) |
polityafter | polity scores after the assassination attempt (in points) |
In this problem set, we practice answering questions related to causal studies: (1) What is the estimated average treatment effect? and (2) Is the effect statistically significant at the 5% level?
As always, we start by loading and looking at the data (remember to set your working directory first!):
year country leadername died politybefore polityafter
1 1929 Afghanistan Habibullah Ghazi 0 -6 -6.000000
2 1933 Afghanistan Nadir Shah 1 -6 -7.333333
3 1934 Afghanistan Hashim Khan 0 -6 -8.000000
4 1924 Albania Zogu 0 0 -9.000000
5 1931 Albania Zogu 0 -9 -9.000000
6 1968 Algeria Boumedienne 0 -9 -9.000000
The outcome variable should be polityafter since that is the variable that records the level of democracy observed in our country cases after assassination attempts have taken place. Based on the histogram of this variable below, we can see that there is a good spread of levels of democracy in our sample.
library(ggplot2)
ggplot(data = leaders, aes(x = polityafter)) + geom_histogram() #plotting a histogram of variable polityafter
The treatment variable should be died since that is the variable that records whether a leader dies, or not, after assassination attempts have taken place. Based on the table of proportions calculated below, we can see that roughly 1/5 of leaders or 20% of leaders appear to die after assassination attempts, in our sample.
library(tidyverse)
leaders %>% count(died) %>% mutate(prop = prop.table(n)) #calculates the proportion of sample cases where leaders did and did not die after assassination attempts
died n prop
1 0 196 0.784
2 1 54 0.216
You can do this in two different ways, either using:
Adding fit with an assignment operator after it, before the regression function, asks R to save the results of the regression model fitted in an object called fit. This is a convenient way to store regression results if you plan to come back to them and use them at a later date.
fit #typing the name of our stored regression object to return/view results
Call:
lm(formula = leaders$polityafter ~ leaders$died)
Coefficients:
(Intercept) leaders$died
-1.895 1.132
The estimated slope coefficient is 1.132, this is equivalent to the difference-in-means estimator of the variable died, as our regression model includes just a single X variable, which denotes the treatment effect.
We estimate that the death of the leader increases the country’s polity scores after the assassination attempt by 1.13 points, on average.
Note: the mathematical definition of \(\widehat{\beta}\) is the \(\triangle \widehat{Y}\) associated with \(\triangle X\)=1. In this case, \(\widehat{\beta}\) is the \(\triangle\)\(\widehat{\textrm{polityafter}}\) associated with \(\triangle {\textrm{died}}\) = 1. Hence, the death of the leader (that is, when died increases by one unit, from 0 to 1) is associated with an increase in polity scores after the assassination attempt of 1.13 points, on average. The unit of measurement is points because Y is non-binary and measured in points so \(\triangle \overline{Y}\) should also be in points.
In addition, because here X is the treatment variable, \(\widehat{\beta}\) is equivalent to the difference-in-means estimator so we should use causal language in its interpretation. Instead of “associated with an increase,” we should say “increases” or “causes an increase of.”
Final answer: We estimate that the death of the leader increases the country’s polity scores after the assassination attempt by 1.13 points, on average. This average treatment effect should be valid if the assassination attempts where the leader ended up dying are comparable to the assassination attempts where the leader ended up surviving, that is, if there are no confounding variables present.
The null and alternative hypotheses are:
\(H_0 {:} \,\, \beta{=}0\) (meaning: a leaders’ death after an assassination attempt has no average causal effect on the level of democracy observed in the nation they govern, at the population level).
\(H_1 {:} \,\, \beta{\neq}0\) (meaning: a leaders’ death after an assassination attempt either increases or decreases the level of democracy observed in the nation they govern, on average, at the population level)
Note that the null and alternative hypotheses refer to \(\beta\), which is the true average causal effect at the population level, not to \(\widehat{\beta}\), which is the estimated average causal effect at the sample level.
summary(fit) #returning the coefficients stored in the regression item fit, created earlier
Call:
lm(formula = leaders$polityafter ~ leaders$died)
Residuals:
Min 1Q Median 3Q Max
-9.238 -5.238 -2.105 4.895 11.895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.8946 0.4659 -4.067 6.41e-05 ***
leaders$died 1.1322 1.0024 1.130 0.26
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.522 on 248 degrees of freedom
Multiple R-squared: 0.005118, Adjusted R-squared: 0.001107
F-statistic: 1.276 on 1 and 248 DF, p-value: 0.2598
The value of the observed test statistic, is 1.13, when rounded to two decimal places.
Note: The observed test statistic for regression coefficients equals \(\widehat{\beta}\) divided by the standard error of \(\widehat{\beta}\). Here, it equals 1.132212/1.002369 = 1.13, which is exactly what R provides as the t-value for the coefficient of the variable died, that is, the value in the cell in the second row, third column of the table above.
The associated p-value is 2.597630e-01, which is 0.260. The notation -01 indicates that we need to divide the value returned by 10 to give the correct number.
We can interpret this as indicating that, if the null hypothesis were true, the probability of observing a test statistic equal to or larger than 1.13 (in absolute value) is about 26% (0.260 * 100 = 26%). This is quite a large probability, well above 5%, so we find no sufficient evidence to reject the null hypothesis.
No, the effect is not statistically significant at the 5% level. Because (a) the absolute value of the observed test statistic is lower than 1.96 (1.13<1.96), and/or (b) the p-value is greater than 0.05 (0.260>0.05), we find no evidence to reject the null hypothesis and conclude that on average, a leaders’ death after an assassination attempt has no causal effect on the average level of democracy observed in the nation they govern at the population level.
In other words, we find no evidence to conclude that the death of a leader after an assassination attempt is likely to have an average effect different than zero on the average level of democracy observed in the nation they govern, at the population level.
Note: You do not need to provide both reasons, (a) and (b). One of them suffices since both procedures should lead to the same conclusion.