Based on Raghabendra Chattopadhyay and Esther Duflo. 2004. Women as Policy Makers: Evidence from a Randomized Policy Experiment in India, Econometrica, 72(5): 1409–43.
All materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).
Let’s continue working with the data from the experiment in India. As a reminder, Table 1 shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.
Table 1: Variables in “india.csv”
Variable | Description |
---|---|
village | village identifier (“Gram Panchayat number_village number”) |
female | whether village was assigned a female politician: 1=yes, 0=no |
water | number of new (or repaired) drinking water facilities in the village since random assignment |
irrigation | number of new (or repaired) irrigation facilities in the village since random assignment |
In this problem set, we practice (1) how to create and make sense of visualisations and (2) how to compute and interpret the correlation between two variables, among other things.
As always, we will start by loading and looking at the data (don’t forget to set your working directory first!):
Create a visualisation of the distribution of the variable water using the ggplot2 package. Remember to install this package if you have not already, and to load this before use.
Create a visualisation of the relationship between the variable water and the variable irrigation using the ggplot2 package.
Compute the correlation between water and irrigation, using the base R function cor().
If we wanted to use the sample of villages in this dataset to infer the characteristics of all villages in India, we would have to make sure that the sample is _____________ of the population. Please provide the missing word…
What would have been the best way of selecting the villages for the sample to ensure that the statement above was true?