Seminar 12

Elizabeth Simon
2024-04-08

Do Women Promote Different Policies than Men?

Based on Raghabendra Chattopadhyay and Esther Duflo. 2004. Women as Policy Makers: Evidence from a Randomized Policy Experiment in India, Econometrica, 72(5): 1409–43.

All materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).

Let’s go back to the data from the experiment in India, which we used in the first few seminars on this module. As a reminder, Table 1 shows the names and descriptions of the variables in this dataset, where the unit of observation is villages.

Table 1: Variables in “india.csv”

Variable Description
village village identifier (“Gram Panchayat number_village number”)
female whether village was assigned a female politician: 1=yes, 0=no
water number of new (or repaired) drinking water facilities in the village since random assignment
irrigation number of new (or repaired) irrigation facilities in the village since random assignment

In this problem set, we’ll be recapping some of the key methods, analyses and skills we’ve learned throughout this course. First, we’ll practice how to estimate an average treatment effect using data from a randomised experiment and then we’ll consider how we determine whether the estimated average treatment effect is statistically significant at the 5% level, or not.

As always, we’ll start by loading and looking at the data (remembering to set our working directory first!):

india <- read.csv("india.csv") # reads and stores data as object called india
head(india) # looking at first few rows of dataset
       village female water irrigation
1 GP1_village2      1    10          0
2 GP1_village1      1     0          5
3 GP2_village2      1     2          2
4 GP2_village1      1    31          4
5 GP3_village2      0     0          0
6 GP3_village1      0     0          0
  1. If we wanted to estimate the average casual effect of having a female politician on the number of new (or repaired) drinking water facilities…
  1. which variable would be our outcome, or Y, variable?

  2. which variable would be our treatment, or X, variable?

  3. which variable might we consider to be a confounding variable?

  1. What is the estimated average casual effect of having a female politician on the number of new (or repaired) drinking water facilities?
  1. Fit a linear regression model to the data in such a way that the estimated slope coefficient is equivalent to the difference-in-means estimator you are interested in and store the fitted model in an object called fit.

  2. What is the estimated slope coefficient returned by this model?

  3. Now, let’s answer the question: What is the estimated average treatment effect? Provide a full substantive answer, making sure to include the assumption you make when using the difference-in-means estimator, why the assumption is or is not reasonable in this case and what we can guage from the the direction, size, and unit of measurement of this average treatment effect.

  4. Fit a linear regression model to the data which includes a confounding variable in addition to the key Y and X variables. Comment on whether, and how, the association of Y and X changes after controlling for the effects of this confounding variable. Store this new fitted model in an object called fit_controls.

  1. Is the effect of having a female politician on the number of new (or repaired) drinking water facilities statistically significant at the 5% level?
  1. Let’s start by specifying both the null and alternative hypotheses, providing both the mathematical notations and their meaning.

  2. What is the value of the observed test statistic in the with- (fit_controls) and without- (fit) controls regression models we have estimated?

  3. What is the associated p-value in both models?

  4. Now, let’s answer the question: Is the effect statistically significant at the 5% level? Please provide your reasoning, and refer to the results of both models estimated in this session.