POL269 - Political Data Research: Seminar 4

Do Women Promote Different Policies than Men?

Based on Raghabendra Chattopadhyay and Esther Duflo. 2004. Women as Policy Makers: Evidence from a Randomized Policy Experiment in India, Econometrica, 72(5): 1409–43.

All materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).

Let’s continue working with the data from the experiment in India. As a reminder, Table 1 shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.

Table 1: Variables in “india.csv”

Variable	Description
village	village identifier (“Gram Panchayat number_village number”)
female	whether village was assigned a female politician: 1=yes, 0=no
water	number of new (or repaired) drinking water facilities in the village since random assignment
irrigation	number of new (or repaired) irrigation facilities in the village since random assignment

In this problem set, we practice (1) how to create and make sense of visualisations and (2) how to compute and interpret the correlation between two variables, among other things.

As always, we will start by loading and looking at the data (don’t forget to set your working directory first!):

india <- read.csv("india.csv") # reads and stores data
head(india) # shows first observations

Create a visualisation of the distribution of the variable water using the ggplot2 package. Remember to install this package if you have not already, and to load this before use.
1. Does this variable look normally distributed?
2. Approximately how many villages in this experiment had about 250 new (or repaired) drinking water facilities since the randomisation of politicians?
Create a visualisation of the relationship between the variable water and the variable irrigation using the ggplot2 package.
1. Does the linear relationship between these two variables look positive or negative?
2. Does the relationship between these two variables look strong or weak?
Compute the correlation between water and irrigation, using the base R function cor().
1. Are you surprised by the sign of the correlation? Please explain your answer fully.
2. are you surprised by the absolute value of the correlation? Again, please provide a fully reasoned and explained answer.
If we wanted to use the sample of villages in this dataset to infer the characteristics of all villages in India, we would have to make sure that the sample is _____________ of the population. Please provide the missing word…
What would have been the best way of selecting the villages for the sample to ensure that the statement above was true?