Based on Benjamin F. Jones and Benjamin A. Olken. 2009. Hit or Miss? The Effect of Assassinations on Institutions and War. American Economic Journal: Macroeconomics, 1(2): 55–87.
All materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).
Let’s go back to the leaders.csv data we have considered in previous seminars. Remember that there is a longstanding debate in the study of international relations on whether individual political leaders make a difference. To explore this issue, we’ll explore the effect of the death of the leader on the level of democracy of a country. For this purpose, we will analyse data on assassinations and assassination attempts against political leaders from 1875 to 2004.
Whether an assassination attempt occurs or not is not a random process. However, once an assassination attempt has occurred, one could argue that whether the assassination attempt is successful or not is the result of small elements of randomness, such as the timing and path of the weapon. As a result, we can consider (at least for now) that, after an assassination attempt, the death a leader is close to random and, thus, the assassination attempts where the leader ended up dying should be, on average, comparable to the assassination attempts where the leader ended up surviving. If this is true, then we can estimate the average causal effect of the death of the leader by computing the difference-in-means estimator.
To measure the level of democracy of the country, we will use polity scores. Polity scores categorize the regime of a country on a 21-point scale ranging from -10 (hereditary monarchy) to +10 (consolidated democracy). The Polity Project has produced polity scores for all countries from 1800 and on. For example, here are the 2018 polity scores.
The dataset is stored in a file called “leaders.csv”. Table 1 shows the names and descriptions of the variables in this dataset, where the unit of observation is assassination attempts.
Table 1: Variables in “leaders.csv”
Variable | Description |
---|---|
year | year of the assassination attempt |
country | country name |
leadername | name of the leader |
died | whether leader died: 1=yes, 0=no |
politybefore | polity scores before the assassination attempt (in points) |
polityafter | polity scores after the assassination attempt (in points) |
In this problem set, we practice answering questions related to causal studies: (1) What is the estimated average treatment effect? and (2) Is the effect statistically significant at the 5% level?
As always, we start by loading and looking at the data (remember to set your working directory first!):
year country leadername died politybefore polityafter
1 1929 Afghanistan Habibullah Ghazi 0 -6 -6.000000
2 1933 Afghanistan Nadir Shah 1 -6 -7.333333
3 1934 Afghanistan Hashim Khan 0 -6 -8.000000
4 1924 Albania Zogu 0 0 -9.000000
5 1931 Albania Zogu 0 -9 -9.000000
6 1968 Algeria Boumedienne 0 -9 -9.000000
Given our research question, what should be our outcome variable (Y)? Visualise its distribution, using the ggplot2 package and a suitable method.
Given our research question, what should be our treatment variable (X)? Visualise its distribution and comment on the proportion of leaders who die after assassination attempts (i.e., is the proportion large or small?).
Now that we have both our Y and X variables, fit a linear model to the data in such a way that the estimated slope coefficient is equivalent to the difference-in-means estimator you are interested in and store the fitted model in an object called fit.
What is the estimated slope coefficient?
Now, let’s answer the question: What is the estimated average treatment effect?
Let’s start by specifying the null and alternative hypotheses. Please provide both the mathematical notations and their meaning.
What is the value of the observed test statistic? Using summary(fit)$coeff to return your stored regression results may be useful here.
What is the associated p-value?
Now, let’s answer the question: Is the effect statistically significant at the 5% level? Please provide your reasoning.