POL269 - Political Data Research: Solutions Seminar 9

According to the frequentist interpretation of probability, the probability of an event is the proportion of its occurrence among infinitely many identical trials. For example, if we flip a coin 1 million times and 300,000 of those times we get heads, what’s the probability of heads?
1. it is impossible to know with the information given
2. 30%
3. 0.3%
4. 300%

The answer is B. 300,000/1,000,000 = 0.3; 0.3*100 = 30%.

According to the Bayesian interpretation, probabilities represent one’s subjective beliefs about the relative likelihood of events. For example, if we state that the probability of rain today is of 0%, we are describing that:
1. the proportion of rain events over multiple days
2. we are certain that it will rain today
3. we are certain that it won’t rain today
4. none of the above

The answer is C. According to the Bayesian interpretation, a probability of rain on any given day of 0% represents a situation in which we are certain that it won’t rain on that day.

In this class, we distinguish between two types of numeric random variables based on the number of values the variables can take. These are:
1. treatment and control
2. predictors and outcomes
3. binary and non-binary
4. treatment and outcome

The answer is C. Binary variables can only take two values; non-binary variables can take more than two values. Treatment variables, outcome variables, and predictors are distinctions based on the role the variables take in the research question, not on the number of values the variables can take.

Each random variable has a probability distribution, which characterises the likelihood of each value the variable can take. The Bernoulli distribution is the probability distribution of:
1. a non-binary variable
2. a binary variable
3. a variable that can take more than two values
4. none of the above

The answer is B. The Bernoulli distribution is the probability distribution of binary variables i.e., those numeric variables which take only two values, typically 0 and 1.

The Bernoulli distribution is characterised by:
1. one parameter
2. two parameters
3. three parameters
4. four parameters

The answer is A. The Bernoulli distribution is characterised by one parameter: p. Once we know p, we know everything there is to know about the distribution. The probability that X equals 1 is p and the probability that X equals 0 is 1-p. The mean of the distribution is p and the variance of the distribution is p(1-p).

If X={1,1,1,0,0}, \(X\) is:
1. a non-binary variable
2. a binary variable
3. an outcome variable
4. a predictor

The answer is B. X is a binary variable, it takes values of only 0 and 1.

If X={1,1,1,0,0}, what is \(P\)(X=1), also known as \(p\)?
1. \(P\)(X=1) = \(p\) = 0.6
2. \(P\)(X=1) = \(p\) = 3
3. \(P\)(X=1) = \(p\) = 0.4
4. none of the above

The answer is A. 3/5 = 0.6, which means that the probability that X equals 1 is 60% (0.6 * 100).

If in a Bernoulli distribution \(p\)=0.7, what is \(P\)(X=0)?
1. \(P\)(X=0) = 0.2
2. \(P\)(X=0) = 0.7
3. \(P\)(X=0) = 1
4. \(P\)(X=0) = 0.3

The answer is D. Since all probabilities in a distribution must add up to 1, P(X=1)+P(X=0)=1. Therefore, if P(X=1)=p, then P(X=0)=1-p. In this case, we know that p is 0.7, so 1-p or P(X=0) is 1-0.7, which = 0.3. The probability that X equals 0 is 30% (0.3 * 100).

The normal distribution is the probability distribution we commonly use as a good approximation for many non-binary variables. The normal distribution is characterized by:
1. one parameter
2. two parameters
3. three parameters
4. four parameters

The answer is B. The normal distribution is characterized by two parameters: \(\mu\) (pronounced mu) and \(\sigma^2\) (pronounced sigma-squared). The mean of the distribution is \(\mu\) and the variance of the distribution is \(\sigma^2\). See a visualisation of this in the graph below.

In mathematical notation, we write a normal random variable \(X\) as \(X\) \(\sim\) \(N\)(\(\mu\), \(\sigma^2\)), where:
1. \(\mu\) stands for the mean of the distribution and \(\sigma^2\) stands for the variance
2. \(\mu\) stands for the mean of the distribution and \(\sigma^2\) for the standard deviation
3. \(\mu\) stands for the variance of the distribution and \(\sigma^2\) stands for the mean d. none of the above

The answer is A. This is discussed in the answer to the previous question.

If \(X \sim N\)(20, 25), then \(X\) is:
1. distributed like a normal distribution with mean 25 and variance 20.
2. distributed like a normal distribution with mean 20 and standard deviation 5.
3. distributed like a normal distribution with mean 20 and standard deviation 25.
4. distributed like a Bernoulli distribution with mean 20 and variance 5.

The answer is B. \(X\) \(\sim\) \(N\)(\(\mu\), \(\sigma^2\)) means that X follows a normal distribution with mean \(\mu\) and variance \(\sigma^2\). In addition, recall that the standard deviation is the square root of the variance. Thus, X has a mean of 20, a variance of 25, and a standard deviation of 5.

The probability density function of the normal distribution represents the likelihood of each possible value the normal random variable can take. We can use it to compute the probability that X takes a value within a given range: \[\textrm{P}(\textrm{x}_{1} \leq X \leq \textrm{x}_{2}) = \textrm{area under the curve between } \textrm{x}_{1} \textrm{ and } \textrm{x}_{2}\] If \(X\) follows the probability density function below, which of the following statements is true?
1. \(\textrm{P}(-1 \leq X \leq 0) < \textrm{P}(1 \leq X \leq 2)\)
2. \(\textrm{P}(-1 \leq X \leq 0) > \textrm{P}(1 \leq X \leq 2)\)
3. \(\textrm{P}(-1 \leq X \leq 0) = \textrm{P}(1 \leq X \leq 2)\)
4. none of the above

The answer is B. The area under the curve between -1 and 0 is larger than the area under the curve between 1 and 2.

The standard normal distribution, is a special case of the normal distribution. In mathematical notation, we refer to the standard normal random variable as \(Z\). Which of the following statements is true?
1. \(Z\) is a normal random variable with mean 0 and variance 1
2. \(Z\) is a normal random variable with mean 0 and standard deviation 1
3. \(Z \sim N\)(0, 1)
4. all of the above

The answer is D. Note that the square root of 1 is 1, so the variance and standard deviation are the same in this case.

In the standard normal distribution:
1. about 95% of the observations are between -1.96 and 1.96
2. \(P\)(Z \(\leq\) -1.96) = \(P\)(Z \(\geq\) 1.96)
3. both A. and B.
4. neither A. nor B.

The answer is C. The normal distribution, and the standard normal, are both symmetrical distributions. So the area under each side of the curve is always identical.

Given the properties of normal distributions, we can always transform a normally distributed random variable into a random variable that is distributed like the standard normal distribution:

if X \(\sim\) N(\(\mu\), \(\sigma^2\)), then \(\frac{X-\mu}{\sigma}\) \(\sim\) N(0,1)

So, if X \(\sim\) N(10, 25), then:
a. \((X-10)/25 \sim N(0,1)\)
b. \((X-10)/5 \sim N(0,1)\)
c. \((X-25)/10 \sim N(0,1)\)
d. \((X-5)/10 \sim N(0,1)\)

The answer is B. What we use in the denominator is the standard deviation \(\sigma\), not the variance \(\sigma^2\). In this case, \(\sigma^2\)=25, therefore \(\sigma\)=\(\sqrt{25}\)=5.

When we analyse data, we are usually interested in the value of a parameter at the population level, such as the proportion of candidate A supporters among all voters in a country. However, we typically only have access to statistics from a small sample of observations drawn from the target population, such as the proportion of supporters among the voters who responded to a survey. Which of the following statements is true:
1. This is not a problem since the sample statistics always equal the population parameters
2. This is a problem since the sample statistics typically differ from the population parameters because the sample contains noise
3. This is not a problem when we use random sampling to draw the sample. Under these circumstances, the sample statistics equal the population parameters
4. All of the above

The answer is B. Note: we can try to minimise this issue by ensuring we select a large sample that is representative of the population on all key features.

Sampling variability refers to the fact that the value of a statistic varies from one sample to another because each sample contains a different set of observations drawn from the target population. Which of the following statements is true?
1. Smaller sample size generally leads to greater sampling variability
2. Sampling variability does not depend on the size of the sample
3. Larger sample size generally leads to greater sampling variability
4. None of the above

The answer is A. Larger samples generally lead to less sampling variability and smaller samples generally lead to more sampling variability.

The Law of Large Numbers states that as the sample size increases, the sample mean of \(X\) approximates the population mean of \(X\). Now, suppose we drew three samples of data from the same target population: the first sample with ten observations (n=10), the second sample with a thousand observations (n=1,000), and the third sample with a million observations (n=1,000,000). Which sample is most likely to provide us with a sample mean closest to the population mean of \(X\)?
1. The first sample
2. The second sample
3. The third sample
4. It is impossible to know with the information given

The answer is C. The Law of Large Numbers states that as the sample size increases, the sample mean of \(X\) more closely approximates the population mean of \(X\), so we are more likely to get a sample mean that is close to the ‘true’ population mean when we base our estimates on larger samples.

The Central Limit Theorem states that as the sample size increases, the standardized sample mean of \(X\) can be approximated by the standard normal distribution. Now, suppose we drew 10,000 samples of 1,000 observations each from the same binary random variable (which by definition follows a Bernoulli distribution). Which of the following statements is true?
1. The standardized sample means will approximately follow a Bernoulli distribution
2. The standardized sample means will approximately follow a normal distribution with mean and variance unknown
3. The standardized sample means will approximately follow a standard normal distribution
4. None of the above

The answer is C. The Central Limit Theorem states that if you have a population with mean \(\mu\) and variance \(\sigma^2\) and and take sufficiently large random samples from the population (with replacement), then the distribution of the standardised sample means will approximately follow a standard normal distribution. That is, they will have a mean of 0 and a variance of 1.

Thanks to the Central Limit Theorem, we know that if we were to draw multiple large samples from a random variable centered at ten at the population level, the distribution of the sample means over these multiple samples will be centered at:
1. Ten
2. zero
3. The number of observations in each sample
4. It is impossible to know with the information given

The answer is A. The Central Limit Theorem states that if we were to draw multiple large samples from a random variable centered at particular value at the population level, the distribution of the sample means over these multiple samples will be centered at this same value.