POL269 - Political Data Research: Seminar 9

Probability

All materials presented here build on the resources for instructors designed by Elena Llaudet and Kosuke Imai in Data Analysis for Social Science: A Friendly and Practical Introduction (Princeton University Press).

The following questions are all based on material presented in Chapter 6 of the main course textbook, Data Analysis for Social Science: A Friendly and Practical Introduction.

According to the frequentist interpretation of probability, the probability of an event is the proportion of its occurrence among infinitely many identical trials. For example, if we flip a coin 1 million times and 300,000 of those times we get heads, what’s the probability of heads?
1. it is impossible to know with the information given
2. 30%
3. 0.3%
4. 300%
According to the Bayesian interpretation, probabilities represent one’s subjective beliefs about the relative likelihood of events. For example, if we state that the probability of rain today is of 0%, we are describing that:
1. the proportion of rain events over multiple days
2. we are certain that it will rain today
3. we are certain that it won’t rain today
4. none of the above
In this class, we distinguish between two types of numeric random variables based on the number of values the variables can take. These are:
1. treatment and control
2. predictors and outcomes
3. binary and non-binary
4. treatment and outcome
Each random variable has a probability distribution, which characterises the likelihood of each value the variable can take. The Bernoulli distribution is the probability distribution of:
1. a non-binary variable
2. a binary variable
3. a variable that can take more than two values
4. none of the above
The Bernoulli distribution is characterised by:
1. one parameter
2. two parameters
3. three parameters
4. four parameters
If X={1,1,1,0,0}, \(X\) is:
1. a non-binary variable
2. a binary variable
3. an outcome variable
4. a predictor
If X={1,1,1,0,0}, what is \(P\)(X=1), also known as \(p\)?
1. \(P\)(X=1) = \(p\) = 0.6
2. \(P\)(X=1) = \(p\) = 3
3. \(P\)(X=1) = \(p\) = 0.4
4. none of the above
If in a Bernoulli distribution \(p\)=0.7, what is \(P\)(X=0)?
1. \(P\)(X=0) = 0.2
2. \(P\)(X=0) = 0.7
3. \(P\)(X=0) = 1
4. \(P\)(X=0) = 0.3
The normal distribution is the probability distribution we commonly use as a good approximation for many non-binary variables. The normal distribution is characterized by:
1. one parameter
2. two parameters
3. three parameters
4. four parameters
In mathematical notation, we write a normal random variable \(X\) as \(X\) \(\sim\) \(N\)(\(\mu\), \(\sigma^2\)), where:
1. \(\mu\) stands for the mean of the distribution and \(\sigma^2\) stands for the variance
2. \(\mu\) stands for the mean of the distribution and \(\sigma^2\) stands for the standard deviation
3. \(\mu\) stands for the variance of the distribution and \(\sigma^2\) stands for the mean
4. none of the above
If \(X \sim N\)(20, 25), then \(X\) is:
1. distributed like a normal distribution with mean 25 and variance 20.
2. distributed like a normal distribution with mean 20 and standard deviation 5.
3. distributed like a normal distribution with mean 20 and standard deviation 25.
4. distributed like a Bernoulli distribution with mean 20 and variance 5.
The probability density function of the normal distribution represents the likelihood of each possible value the normal random variable can take. We can use it to compute the probability that X takes a value within a given range: \[\textrm{P}(\textrm{x}_{1} \leq X \leq \textrm{x}_{2}) = \textrm{area under the curve between } \textrm{x}_{1} \textrm{ and } \textrm{x}_{2}\] If \(X\) follows the probability density function below, which of the following statements is true?
1. \(\textrm{P}(-1 \leq X \leq 0) < \textrm{P}(1 \leq X \leq 2)\)
2. \(\textrm{P}(-1 \leq X \leq 0) > \textrm{P}(1 \leq X \leq 2)\)
3. \(\textrm{P}(-1 \leq X \leq 0) = \textrm{P}(1 \leq X \leq 2)\)
4. none of the above
The standard normal distribution, is a special case of the normal distribution. In mathematical notation, we refer to the standard normal random variable as \(Z\). Which of the following statements is true?
1. \(Z\) is a normal random variable with mean 0 and variance 1
2. \(Z\) is a normal random variable with mean 0 and standard deviation 1
3. \(Z \sim N\)(0, 1)
4. all of the above
In the standard normal distribution:
1. about 95% of the observations are between -1.96 and 1.96
2. \(P\)(Z \(\leq\) -1.96) = \(P\)(Z \(\geq\) 1.96)
3. both A. and B.
4. neither A. nor B.
Given the properties of normal distributions, we can always transform a normally distributed random variable into a random variable that is distributed like the standard normal distribution:

if X \(\sim\) N(\(\mu\), \(\sigma^2\)), then \(\frac{X-\mu}{\sigma}\) \(\sim\) N(0,1)

So, if X \(\sim\) N(10, 25), then:
a. \((X-10)/25 \sim N(0,1)\)
b. \((X-10)/5 \sim N(0,1)\)
c. \((X-25)/10 \sim N(0,1)\)
d. \((X-5)/10 \sim N(0,1)\)

When we analyse data, we are usually interested in the value of a parameter at the population level, such as the proportion of candidate A supporters among all voters in a country. However, we typically only have access to statistics from a small sample of observations drawn from the target population, such as the proportion of supporters among the voters who responded to a survey. Which of the following statements is true:
1. This is not a problem since the sample statistics always equal the population parameters
2. This is a problem since the sample statistics typically differ from the population parameters because the sample contains noise
3. This is not a problem when we use random sampling to draw the sample. Under these circumstances, the sample statistics equal the population parameters
4. All of the above
Sampling variability refers to the fact that the value of a statistic varies from one sample to another because each sample contains a different set of observations drawn from the target population. Which of the following statements is true?
1. Smaller sample size generally leads to greater sampling variability
2. Sampling variability does not depend on the size of the sample
3. Larger sample size generally leads to greater sampling variability
4. None of the above
The Law of Large Numbers states that as the sample size increases, the sample mean of \(X\) approximates the population mean of \(X\). Now, suppose we drew three samples of data from the same target population: the first sample with ten observations (n=10), the second sample with a thousand observations (n=1,000), and the third sample with a million observations (n=1,000,000). Which sample is most likely to provide us with a sample mean closest to the population mean of \(X\)?
1. The first sample
2. The second sample
3. The third sample
4. It is impossible to know with the information given
The Central Limit Theorem states that as the sample size increases, the standardized sample mean of \(X\) can be approximated by the standard normal distribution. Now, suppose we drew 10,000 samples of 1,000 observations each from the same binary random variable (which by definition follows a Bernoulli distribution). Which of the following statements is true?
1. The standardized sample means will approximately follow a Bernoulli distribution
2. The standardized sample means will approximately follow a normal distribution with mean and variance unknown
3. The standardized sample means will approximately follow a standard normal distribution
4. None of the above
Thanks to the Central Limit Theorem, we know that if we were to draw multiple large samples from a random variable centered at ten at the population level, the distribution of the sample means over these multiple samples will be centered at:
1. Ten
2. zero
3. The number of observations in each sample
4. It is impossible to know with the information given