Based on Raghabendra Chattopadhyay and Esther Duflo. 2004. Women as Policy Makers: Evidence from a Randomized Policy Experiment in India, Econometrica, 72(5): 1409–43.
All materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).
Let’s continue working with the data from the experiment in India. As a reminder, Table 1 shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.
Table 1: Variables in “india.csv”
Variable | Description |
---|---|
village | village identifier (“Gram Panchayat number_village number”) |
female | whether village was assigned a female politician: 1=yes, 0=no |
water | number of new (or repaired) drinking water facilities in the village since random assignment |
irrigation | number of new (or repaired) irrigation facilities in the village since random assignment |
In this problem set, we will practice (1) how to estimate an average treatment effect using data from a randomized experiment and (2) how to write a conclusion statement.
As always, we will start by loading and looking at the data (don’t forget to set your working directory first!):
village female water irrigation
1 GP1_village2 1 10 0
2 GP1_village1 1 0 5
3 GP2_village2 1 2 2
4 GP2_village1 1 31 4
5 GP3_village2 0 0 0
6 GP3_village1 0 0 0
To estimate the average causal effect of having a female politician on the number of new (or repaired) drinking water facilities we would use the difference-in-means estimator.
To find out the average number of new (or repaired) drinking water facilities in villages with a female politicians, we need to do the following:
install.packages("tidyverse") # installing tidyverse packages
We can then calculate the mean of compute mean of water in villages with female politicians like so:
india %>% group_by(female) %>% summarise(mean = mean(water)) # calculates the average of water by female
# A tibble: 2 × 2
female mean
<int> <dbl>
1 0 14.7
2 1 24.0
We specify the name of the dataframe, india, before the first ‘pipe’ operator %>% as the variable water, which we want to take the mean of for villages with female politicians, is located in this dataframe. The group_by() command then specifies that we want to group our dataframe, india, according to the variable female - one group of villages with female politicians, represented by the value 1 (see Table 1), and one group of villages which are not, represented by the value 0. The first call to mean in the function summarise () asks R to produce a record called ‘mean’ which stores the specific type of mean value specified subsequently and the latter call to mean asks R to produce the mean of the variable water grouped by the variable female in the dataframe india.
Our output tells us that the average number of new (or repaired) drinking water facilities in villages with a female politician (female == 1) is 24 facilities.
The output presented in question 2 also provides us with the answer to this question, telling us that the average number of new (or repaired) drinking water facilities in villages with a male politician (female == 0) is 15 facilities. We round up from 14.7 to 15 as it is not possible to have a fraction of a drinking water facility.
The difference-in-means estimator is calculated by subtracting the average outcome for the control group from the average outcome for the treatment group.
Difference-in-means estimator = average outcome for treatment group - average outcome for control group
What is the assumption we make when using the difference-in-means estimator? We assume that the villages that were assigned to have a female politician (the treatment group) are comparable to the villages that were assigned to NOT have a female politician (the control group). If this assumption were not true the difference-in-means estimator would NOT produce a valid estimate of the average treatment effect.
Why is this assumption reasonable? Because the female politicians were assigned at random OR because the data come from a randomized experiment. Random treatment assignment makes the treatment and control groups on average identical to each other in all observed and unobserved pre-treatment characteristics.
What is the treatment here? Having a female politician in the village
What is the outcome here? the number of new (or repaired) drinking water facilities
What is the direction, size, and unit of measurement of the average causal effect? In this case, the difference-in-means estimator = mean number of drinking water facilities in villages with a female politician - mean number of drinking water facilities in villages with a male politician. We have already calculated both of these quantities so we simply input these into the equation presented previously to obtain the following:
Difference-in-means estimator = 24 - 14.7
Difference-in-means estimator = 9.3
The positive estimator produced here shows that there tends to be more drinking water facilities in villages with female politicians than those with male politicians. Specifically, that there is an increase of 9 drinking water facilities in villages with female politicians, on average.
Assuming that the villages that were randomly assigned to have a female politician were comparable to the villages that were randomly assigned to NOT have a female politician (a reasonable assumption since the female politicians were assigned at random), we estimate that having a female politician increases the number of new or repaired drinking water facilities by 9 facilities, on average.