POL269 - Political Data Research: Solutions Seminar 1

Do Women Promote Different Policies than Men?

Based on Raghabendra Chattopadhyay and Esther Duflo. 2004. Women as Policy Makers: Evidence from a Randomized Policy Experiment in India, Econometrica, 72(5): 1409–43.

All materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).

In a few of this module’s problem sets, we will estimate the average causal effect of having a female politician on two different policy outcomes. For this purpose, we will analyse data from an experiment conducted in India, where villages were randomly assigned to have a female council head. The dataset we will use is in a file called “india.csv”. Table 1 shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.

Table 1: Variables in “india.csv”

Variable	Description
village	village identifier (“Gram Panchayat number_village number”)
female	whether village was assigned a female politician: 1=yes, 0=no
water	number of new (or repaired) drinking water facilities in the village since random assignment
irrigation	number of new (or repaired) irrigation facilities in the village since random assignment

In this problem set, we will practice how to load and make sense of data.

We can use the function read.csv() to read the CSV file “india.csv” and the assignment operator <- to store the data in an object called india. Before doing this we must set our working directory. We do this by selecting Session >> Set Working Directory >> Choose Directory… from the R Studio menu.

india <- read.csv("india.csv") # reads and stores data

Recall: to the left of the assignment operator <-, we specify the name of the object, india in this case; to the right of the assignment operator <-, we specify the contents, which, in this case, are produced by reading the CSV file “india.csv”. Also, we do not use quotes around the name of an object such as india or around the name of a function such as read.csv, but we do use quotes around the name of a file: “india.csv”.

We can use the function head() to view the first few observations of the dataset, like so:

head(india) # shows first observations

       village female water irrigation
1 GP1_village2      1    10          0
2 GP1_village1      1     0          5
3 GP2_village2      1     2          2
4 GP2_village1      1    31          4
5 GP3_village2      0     0          0
6 GP3_village1      0     0          0

Each observation represents a village. We know this because, as stated above, the unit of observation in this dataset is villages.
Looking at the results produced in question 2) and Table 1, which describes each variable, we can deduce that the first observation in the dataset represents village 2 in Gram Panchayat 1. We can also see that this village was assigned a female politician (variable female takes value 1), that it had 10 new or repaired drinking water facilities (variable water takes value 10) and that it had 0 new or repaired irrigation facilities since politicians were randomly assigned (variable irrigation takes value 0).
village is a character variable, female is a numeric binary variable, water is a numeric non-binary variable, and irrigation is a numeric non-binary variable.

Remember that binary variables can only take two values, 0s and 1s, while non-binary variables can take more than two values.

You can find out how many observations are in the dataset, or in other words, how many villages were part of the experiment, using the dim() function as below:

dim(india) # provides dimensions of dataframe: rows, columns

[1] 322   4

The first number provided by dim() corresponds to the number of observations in the dataframe and the second number corresponds to the number of variables.

Based on the output of dim(), we can therefore conclude that there are 322 observations in the dataframe india. Since the unit of observation is villages, the dataframe india has data on 322 villages in India.

Before using the tidyverse function summarise() to calculate the average of the variable female, you must first load and install the “tidyverse” package. We do this as follows:

install.packages("tidyverse") # installing tidyverse packages

library(tidyverse) # loading tidyverse packages

We can the calculate the mean of female by running the following code:

india %>% summarise(mean = mean(female)) # calculates the average of female

       mean
1 0.3354037

We specify the name of the dataframe, india, before the ‘pipe’ operator %>% as the variable female, which we want to take the mean of, is located in this dataframe. The first call to mean in the function summarise () asks R to produce a record called ‘mean’ which stores the specific type of mean value specified subsequently and the latter call to mean asks R to produce the mean of the variable female located in the dataframe india.

34% of the villages in the experiment were randomly assigned to have a female politician. We round up from 33.54% as people are ‘whole units’; we cannot have half a female. The unit of measurement is %, after multiplying the rounded output by 100. Here: 0.34*100=34%.

Note: The word “average” does not appear in the interpretation of this mean because it should be interpreted as a proportion. The mean of a binary variable should be interpreted as the proportion of the observations that have the characteristic identified by the variable. Here, female is binary and identifies the villages that were assigned to have a female politician.

As above, we calculate the average of the variable water by using the tidyverse function summarise().

india %>% summarise(mean = mean(water)) # calculates the average of female

      mean
1 17.84161

After the randomization of politicians, the average number of new (or repaired) drinking water facilities per village is 18 facilities. Note; the word “average” appears in the interpretation of this mean. This is because the mean of a non-binary variable should be interpreted as an average and in the same unit of measurement as the variable itself, which is facilities in this case.

If we wanted to estimate the average causal effect of having a female politician on the number of new (and repaired) drinking water facilities, the treatment variable would be female and the outcome variable would be water.
If we wanted to estimate the average causal effect of having a female politician on the number of new (and repaired) irrigation facilities, the treatment variable would be female and the outcome variable would be irrigation.
In both of the analyses above, the treatment group would be the villages that were randomly assigned to have a female politician and the control group would be the villages that were randomly assigned to NOT have a female politician.

As the unit of observation is villages, both of these groups are composed of villages; the treatment group are the villages that were assigned to receive the treatment — having a female politician; the control group are the villages that were assigned to NOT receive the treatment.