All materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).
Today’s seminar will consist of two parts. The first will practice one of the most useful skills we can teach you, which is the ability to evaluate social scientific studies. All of you will need, at one point or another, to read a published research article, make sense of it, and figure out whether you can trust the findings. So we are going to begin this week’s seminar by practicing just that.
For this purpose, we are going to read sections of: Ansolabehere, Stephen, Shanto Iyengar, Adam Simon and Nicholas Valentino. (1994) Does Attack Advertising Demobilize the Electorate? The American Political Science Review, 88(4), pp.829-838.
You can find the article with the highlighted sections I want you to focus on uploaded to the POL269 website.
Please answer the following questions based on the highlighted sections of the text.
Yes, it is a causal study because it aims to estimate causal effects.
It is a randomised experiment because the treatment was assigned at random.
Exposure to a negative political TV advertisement (rather than a non-political one) is the treatment.
Intention to vote.
Each observation represents a participant or person.
1,655 people.
Let’s start by figuring out each key element separately.
What’s the assumption? We assume that the participants who were exposed to a negative political TV advertisement (the treatment group) were comparable to the participants who were exposed to a non-political TV advertisement (the control group). If this assumption were not true, the difference-in-means estimator would NOT produce a valid estimate of the average treatment effect.
Why is the assumption reasonable? Because negative political TV advertisements were assigned at random OR because the data come from a randomised experiment. Remember that random treatment assignment makes the treatment and control groups identical to each other in all observed and unobserved pre-treatment characteristics, on average.
What’s the treatment? Being exposed to a negative political TV advertisement.
What’s the outcome? Intention to vote.
What’s the direction, size, and unit of measurement of the average causal effect? A decrease of 2.5 percentage points, on average. It is a decrease because we are measuring change—the change in the outcome variable caused by the treatment—and the difference-in-means estimator is negative.
The difference-in-means estimator = the proportion of participants who intend to vote, among those exposed to negative political TV advertisements - the proportion of participants who intend to vote, among those exposed to non-political TV advertisements - 58% - 61% = -2.5 percentage points.
Assuming that the participants who were exposed to a negative political TV advertisement were comparable to the participants who were exposed to a non-political TV advertisement (a reasonable assumption since negative political TV advertisements were assigned at random), we estimate that being exposed to a negative political TV advertisement decreases intention to vote by about 2.5 percentage points, on average.
The second part of today’s seminar focuses on confounding variables, and how controlling for these changes our estimates of average causal effects.
In Seminar 6, we used the dataset “leaders.csv” to estimate the causal effect of the death of a leader on the level of democracy of a country, and showed that the difference-in-means estimator is equivalent to the slope coefficient estimated by a linear regression in cases where regression models include just an outcome (Y) and treatment (X) variable.
In this seminar, we show that the difference-in-means estimator is not equivalent to the slope coefficient estimated by a linear regression in cases where regression models include confounding variables, in addition to the outcome (Y) and treatment (X) variable.
Table 1 provides a quick reminder of the names and descriptions of the variables in the leaders dataset, where the unit of observation is assassination attempts.
Table 1: Variables in “leaders.csv”
Variable | Description |
---|---|
year | year of the assassination attempt |
country | country name |
leadername | name of the leader |
died | whether leader died: 1=yes, 0=no |
politybefore | polity scores before the assassination attempt (in points) |
polityafter | polity scores after the assassination attempt (in points) |
As always, we start by loading and looking at the data (remembering to set our working directory first!):
year country leadername died politybefore polityafter
1 1929 Afghanistan Habibullah Ghazi 0 -6 -6.000000
2 1933 Afghanistan Nadir Shah 1 -6 -7.333333
3 1934 Afghanistan Hashim Khan 0 -6 -8.000000
4 1924 Albania Zogu 0 0 -9.000000
5 1931 Albania Zogu 0 -9 -9.000000
6 1968 Algeria Boumedienne 0 -9 -9.000000
lm(leaders$polityafter ~ leaders$died)
Call:
lm(formula = leaders$polityafter ~ leaders$died)
Coefficients:
(Intercept) leaders$died
-1.895 1.132
Or as:
lm(polityafter ~ died , data = leaders)
Call:
lm(formula = polityafter ~ died, data = leaders)
Coefficients:
(Intercept) died
-1.895 1.132
As our regression model includes just an outcome (Y) and treatment (X) variable, the slope coefficient estimated here (1.13) is equivalent to the difference-in-means estimator. We demonstrated this in the seminar in week 6.
We interpret this slope coefficient/difference-in-means estimator as follows: Assuming that the assassination attempts where the leader ended up dying are comparable to the assassination attempts where the leader ended up surviving (an assumption that might be reasonable if the death of the leader after an assassination attempt is close to random), we estimate that the death of the leader increases the country’s polity scores after the assassination attempt by 1.13 points, on average
We do this like so:
lm(leaders$polityafter ~ leaders$died + leaders$politybefore)
Call:
lm(formula = leaders$polityafter ~ leaders$died + leaders$politybefore)
Coefficients:
(Intercept) leaders$died leaders$politybefore
-0.4346 0.2616 0.8375
Or as:
lm(polityafter ~ died + politybefore, data = leaders)
Call:
lm(formula = polityafter ~ died + politybefore, data = leaders)
Coefficients:
(Intercept) died politybefore
-0.4346 0.2616 0.8375
Note that to include confounding variables in a linear regression model you simply add these on the right hand side of your regression equation, using a + sign to add each additional variable to the other X variables.
After controlling for countries’ democracy scores which were recorded prior to the assassination attempt, we estimate that the death of a leader after an assassination attempt increases a country’s polity scores by only 0.26 points, on average.
The substantial reduction in the size of the slope coefficient which is linked to controlling for countries’ prior democracy scores suggests that much of the effect of the death of a leader after an assassination attempt on a countries’ level of democracy, which we observed in the previous model, is actually being driven by pre-existing differences in countries’ prior democracy scores.
Interpreting the regression coefficient for politybefore tells us that a one-point increase in a country’s democracy score prior to an assassination attempt is associated with a 0.84 point increase in that same score after an assassination attempt. There is a clear correlation between polity (democracy) scores both pre- and post- assassination attempts on average, which suggests successful assassinations attempts do not make a large difference to the average levels of democracy observed in the countries considered here, on the whole.
This exercise provides a clear demonstration of the fact that the difference-in-means estimator is not equivalent to the slope coefficient estimated by a linear regression in cases where regression models include confounding variables, in addition to the outcome (Y) and treatment (X) variable.
When looking to estimate causal effects while controlling for confounding variables, we should always use the linear regression method rather than the difference-in-means estimator.