Seminar 1

Elizabeth Simon
2024-01-22

Do Women Promote Different Policies than Men?

Based on Raghabendra Chattopadhyay and Esther Duflo. 2004. Women as Policy Makers: Evidence from a Randomized Policy Experiment in India, Econometrica, 72(5): 1409–43.

All materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).

In a few of this module’s problem sets, we will estimate the average causal effect of having a female politician on two different policy outcomes. For this purpose, we will analyse data from an experiment conducted in India, where villages were randomly assigned to have a female council head. The dataset we will use is in a file called “india.csv”. Table 1 shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.

Table 1: Variables in “india.csv

Variable Description
village village identifier (“Gram Panchayat number_village number”)
female whether village was assigned a female politician: 1=yes, 0=no
water number of new (or repaired) drinking water facilities in the village since random assignment
irrigation number of new (or repaired) irrigation facilities in the village since random assignment

In this problem set, we will practice how to load and make sense of data.

  1. Use the function read.csv() to read the CSV file “india.csv” and use the assignment operator <- to store the data in an object called india. You must set your working directory before doing so. You can do this by selecting Session >> Set Working Directory >> Choose Directory… from the R Studio menu.

  2. Use the function head() to view the first few observations of the dataset.

  3. What does each observation in this dataset represent?

  4. How can we substantively interpret the first observation in the dataset?

  5. For each variable in the dataset, please identify the type of variable (character vs. numeric binary vs. numeric non-binary).

  6. How many observations are in the dataset? In other words, how many villages were part of this experiment? (Hint: the function dim() might be helpful here.) Please provide both the R code you used and the substantive answer.

library() and the tidyverse

One of the most time-consuming parts of using R, and data analysis in general, is managing, tidying and ‘wrangling’ your data. Often, we start with data in a format that is not particularly useful for analysis. We might come across a dataset, for example, with far too many columns, covering lots of variables we are not interested in.

The base functions in R, which are what the textbook uses, might not always be the best option for data manipulation and analysis. Users can create collections of functions, called “packages”. One of those packages is the tidyverse which is, in turn, a collection of packages for data analysis, manipulation, visualisation, among other useful tools.

The first time we use a package, we need to install it. And then we use the library() function to load the library. You only need to install the package once, but will need to load it every time you open RStudio:

# install.packages("tidyverse") # only need to do this once
library(tidyverse)

Let’s continue working with the data from the experiment in India. As a reminder, Table 1 shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.

Table 1: Variables in “india.csv”

Variable Description
village village identifier (“Gram Panchayat number_village number”)
female whether village was assigned a female politician: 1=yes, 0=no
water number of new (or repaired) drinking water facilities in the village since random assignment
irrigation number of new (or repaired) irrigation facilities in the village since random assignment

In this problem set, we practice how to compute and interpret means, among other things.

We will start by loading and looking at the data, like so:

india <- read.csv("india.csv") # reads and stores data
head(india) # shows first observations
       village female water irrigation
1 GP1_village2      1    10          0
2 GP1_village1      1     0          5
3 GP2_village2      1     2          2
4 GP2_village1      1    31          4
5 GP3_village2      0     0          0
6 GP3_village1      0     0          0
  1. You can either use the mean() function or the tidyverse function summarise() to calculate the average of the variable female. Please provide a full substantive interpretation of what this average means. Make sure to provide the unit of measurement.

For example:

mean(india$female)
[1] 0.3354037
# Or
india %>% summarise(mean = mean(female)) # calculates the average of female
       mean
1 0.3354037
  1. Now use the tidyverse function summarise() to calculate the average of the variable water. Again, please provide a full substantive interpretation of what this average means and make sure to provide the unit of measurement.

  2. If we wanted to estimate the average causal effect of having a female politician on the number of new (and repaired) drinking water facilities:

    1. What would be the treatment variable? Please just provide the name of the variable.
    2. What would be the outcome variable? Please just provide the name of the variable.
       
  3. If we wanted to estimate the average causal effect of having a female politician on the number of new (and repaired) irrigation facilities:

    1. What would be the treatment variable? Please just provide the name of the variable.
    2. What would be the outcome variable? Please just provide the name of the variable.
       
  4. In both analyses above:

    1. What would be the treatment group?
    2. What would be the control group?