3 Confidence intervals
3.1 Recap on bias assessment
- Let’s do some more practice on computing bias using simulations and analytically.
- Consider the NSDUH data where the probability of the event is small.
- E.g. few people in the population have used PCP
Code
##
## no yes
## 0.98537708 0.01462292
## [1] no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no no
## [32] no no no no no no no no no no no no no no no no no no no
## Levels: no yes
In cases like this, sometimes people recommend using a modified estimator of \(p\), \[ \tilde p = \frac{\sum_{i=1}^n X_i + 1/2}{n + 1}, \] which adds 1/2 observation to the “yes” category and 1/2 observation to the “no” category.
What is expected value of \(\tilde p\)?
Plot the bias of \(\tilde p\).
Code
We can also run simulations to assess the bias and the curves should agree with the results that we found in the other plots.
Code
nsim = 10000
n = 1:200
ns = c(10, 25, 50, 100, 200)
ps = c(0.01, 0.1, 0.25, 0.45)
for(p in ps){
# sample data for each sample size
samps = replicate(nsim, {(rbinom(length(ns), size = ns, p) + 1/2)/(ns+1)} )
# take mean across simulations
simBias = rowMeans(samps) - p
ptildeBias = (n * p + 0.5)/(n+1) - p
# divide by p for percent bias
plot(n, ptildeBias/p, xlab='n', ylab='Bias(ptilde)', type='l', main=paste0('p=',p) )
points(ns, simBias/p, type='b')
abline(h=0)
}
3.2 Confidence intervals for a mean
Today we will:
- Introduce a confidence interval for normally distributed data
- We will use spinal cord imaging to study these confidence intervals
- We will run a simulation to understand what the confidence interval means about dataset-to-dataset variability
3.2.1 Multiple sclerosis in the spinal cord
- Seth Smith’s research team is developing advanced spinal cord magnetic resonance imaging sequences to study multiple sclerosis (MS).
- These new sequences are making it possible to study the spinal cord in more detail through development and aging
- Today our research question is What is the cross-sectional area of the C3 segment of the spinal cord?
3.2.1.1 Spinal cord anatomy
- The cord section we are looking at is cervical section C3 (near the top).
- Damage here affects a large portion of the body.
3.2.1.2 Seth’s interest in the spinal cord
- MS is a progressive disease associated with demyelination of neurons (reduces signaling efficiency) that has significant detriment on many aspects of daily life, with motor symptoms being the most common.
- MRI has almost exclusively focused on the brain in MS and associated MRI-derived brain lesion measures with severity of MS symptoms
- Seth pointed out to the MS community that there are lesions in the spinal cord that scientists haven’t paid as much attention to.
3.2.1.3 Characterization of the C3 cross-sectional area
- What random variable models might be appropriate for these data?
Code
load('../datasets/smith_cervical_sc/tpmf.rdata')
cex=1.5
par(mgp=c(1.7,.7,0), lwd=1.5, lend=2, cex.lab=0.8*cex, cex.axis=0.8*cex, cex.main=1*cex, mfrow=c(1,1), mar=c(2.8,2.8,1.8,.2), bty='l', oma=c(0,0,2,0))
# histogram of C3 CSA
hist(mf$`C3 CSA`, main='C3 CSA', xlab='C3 CSA', freq=F)
lines(seq(min(mf$`C3 CSA`), max(mf$`C3 CSA`), length.out=200), dnorm(seq(min(mf$`C3 CSA`), max(mf$`C3 CSA`), length.out=200), mean = mean(mf$`C3 CSA`), sd = sd(mf$`C3 CSA`)))
- What is a good model for these data?
3.2.1.4 CIs conceptually
The concept of a CI highlights the distinction between a parameter, estimate, and estimator.
- What is the parameter?
- What is an estimate of the parameter?
- What is an estimator of the parameter, notationally?
The concept of an estimator allows us to make probabilistic statements about the procedures we execute in our dataset.
3.2.2 Confidence interval definition
Confidence intervals are an interval obtained from a sample that contains the true value of the parameter with a given probability. \[ P\{L(X_1, \ldots, X_n) \le p < U(X_1, \ldots, X_n) \} = 1-\alpha, \] for a given value \(\alpha \in (0,1)\).
Note what things are random here (the end points). The parameter is fixed.
3.2.3 Known variance confidence interval for the mean
- Let’s assume our spinal cord data \(X_1, \ldots X_n \sim N(\mu, \sigma^2)\) are IID and normally distributed.
- The empirical mean is an estimator for this value \[ \hat\mu = n^{-1}\sum_{i=1} X_i. \]
- Let’s (unrealistically) assume that \(\sigma^2\) is known.
3.2.3.1 The interval derivation
- The sum of IID normals is normal \(N(\mu, \sigma^2/n)\).
- This is exact (not an approximation).
- What is the distribution of \(\sqrt{n}(\hat\mu - \mu)/\sigma\)?
- We can construct an \(\alpha\) probability confidence interval for \(\mu\) using this fact. \[ \begin{align*} \mathbb{P}(z_{\alpha/2} < Z < z_{1-\alpha/2} ) & = \Phi(z_{1-\alpha/2}) - \Phi(z_{\alpha/2} ) \\ & = 1-\alpha/2 - \alpha/2\\ &= 1-\alpha \end{align*} \]
So we can create the interval \[ \begin{align*} \mathbb{P}(z_{\alpha/2} < Z < z_{1-\alpha/2} ) & = \mathbb{P}(z_{\alpha/2} < \sqrt{n}(\hat\mu - \mu)/\sigma < z_{1-\alpha/2} ) \end{align*} \]
3.3 Computing confidence intervals
- Recap of constructing normal confidence interval
- Computing the confidence interval in the spinal cord data
3.3.1 Constructing the confidence interval in the Spinal cord data set
Code
## [1] 2.575829
Code
zalpha = qnorm(1- alpha/2)
# construct the interval
interval = c(muHat - zalpha * stdev /sqrt(n), muHat + zalpha * stdev /sqrt(n) )
hist(mf$`C3 CSA`, main='C3 CSA', xlab='C3 CSA', freq=F)
lines(seq(min(mf$`C3 CSA`), max(mf$`C3 CSA`), length.out=200), dnorm(seq(min(mf$`C3 CSA`), max(mf$`C3 CSA`), length.out=200), mean = mean(mf$`C3 CSA`), sd = sd(mf$`C3 CSA`)))
abline(v=interval, lty=2, lwd=2)
- The mean in my data set, \(\bar X_n\), lies within this interval with probability 1-alpha.
- This interval contains the true mean parameter with probability 1-alpha.
- The procedure we used to create this interval captures the mean parameter 1-alpha% of the time.
- Across a large number of samples, approximately (1-alpha)% of the intervals will contain the true value of the parameter.
3.3.2 Dataset-to-dataset variability
Code
mu = 1
sigma = 0.8
n = 500
nsim = 100
alpha = 0.05
CIs = data.frame(mean=rep(NA, nsim), lower=rep(NA, nsim), upper=rep(NA, nsim))
for(sim in 1:nsim){
# draw a random sample
X = rnorm(n, mean=mu, sd=sigma)
stdev = sd(X)
# construct the confidence interval
CIs[sim, ] = c(mean(X), mean(X) + c(-1,1)*qnorm(1-alpha/2)*stdev/sqrt(n))
}
CIs$gotcha = ifelse(CIs$lower<=mu & CIs$upper>=mu, 1, 2)
# range(c(CIs$lower, CIs$upper))
plot(1:nsim, CIs$mean, pch=19, ylim = c(0,2), main=paste(nsim, 'experiments'), col=CIs$gotcha )
segments(x0=1:nsim, y0=CIs$lower, y1=CIs$upper, col=CIs$gotcha)
abline(h=mu, lty=2 )
3.3.3 Estimating the variance
- Almost always, the variance of the data is unknown.
- We need an estimate of \(\sigma^2\) from the data. Two are common \[ \begin{align*} \tilde \sigma^2 & = n^{-1} \sum_{i=1}^n (X_i - \bar X)^2 \text{ (maximum likelihood estimator)}\\ \hat \sigma^2 & = (n-1)^{-1} \sum_{i=1}^n (X_i - \bar X)^2 \text{ (unbiased estimator)}\\ \end{align*} \]
- We’ll focus on the second estimator.
3.3.4 Unknown variance approximate interval
What is the distribution of \(\sqrt{n}(\hat\mu - \mu)/\hat\sigma\)?
If we don’t know the distribution, then there is theoretical justification (using Slutsky’s theorem) to say \[ \sqrt{n}(\hat\mu - \mu)/\hat\sigma \sim N(0,1) \text{ (approximately)} \]
Approximately means asymptotically, which means that if we can control how big our sample is, then we can make this approximation as precise as we want by choosing a big enough sample size.
This suggests a confidence interval \[ \hat \mu \pm z_{\alpha/2} \times \hat\sigma/\sqrt{n}. \]
3.4 T- confidence interval
Today we will:
- Find an exact interval for the mean when the data are normally distributed
- Compare the intervals we’ve derived, so far
- Discuss our assumptions and learn how to assess the assumptions
3.4.1 Unknown variance T-interval
We need some preliminaries:
If \(Z_i \sim N(0, 1)\) are iid, then (by definition) \(X = \sum_{i=1}^{(n-1)}Z_i^2\) is Chi-squared distributed on \((n-1)\) degrees of freedom. Usually denoted \(X\sim \chi^2(n-1)\).
If \(Z \sim N(0, 1)\) and \(X\sim \chi^2(n-1)\) are independent, then \[ T = \frac{Z}{\sqrt{X/(n-1)}} \sim t(n-1). \] where \(t(n-1)\) denotes the t-distribution on \(n-1\) degrees of freedom.
3.4.1.1 How does this help us and what does it have to do with beer?
We already know that, \[ \sqrt{n}(\bar X - \mu)/\sigma \sim N(0,1). \] It turns out that \[ (n-1)\hat \sigma^2/\sigma^2 \sim \chi^2(n-1), \] and (surprisingly!) \(\hat \sigma^2\) and \(\bar X\) are independent! (Which implies that the two random variables above are also independent)
Then the test statistic we’ve been working with is \[ \sqrt{n}(\hat\mu - \mu)/\hat\sigma = \sqrt{n}(\hat\mu - \mu)/\sigma \times \sqrt{\frac{\sigma^2}{\hat\sigma^2} } =_D Z \times \sqrt{\frac{n-1}{X}} =_D T, \] \(Z \sim N(0, 1)\) and \(X\sim \chi^2(n-1)\), and \(T\sim t(n-1)\)
This is the one-sample t-test (or confidence interval).
3.4.1.3 But what does it have to do with beer!?!?
William Gossett published the t-test when he was working for Guinness and was required to publish under a pseudoname. Wikipedia has more.
3.4.1.4 Possible intervals (so far)
Here are our intervals then
\[ \begin{align*} \hat \mu \pm z_{1-\alpha/2} \times \sigma/\sqrt{n} \\ \hat \mu \pm z_{1-\alpha/2} \times \hat\sigma/\sqrt{n} \\ \hat \mu \pm t_{n-1,\alpha/2} \times \hat\sigma/\sqrt{n}. \end{align*} \]
Code
## [1] -1.959964
## [1] -1.986675
Code
# construct the interval
Zinterval = c(muHat - zalpha * stdev /sqrt(n), muHat + zalpha * stdev /sqrt(n) )
Tinterval = c(muHat - talpha * stdev /sqrt(n), muHat + talpha * stdev /sqrt(n) )
hist(mf$`C3 CSA`, main='C3 CSA', xlab='C3 CSA', freq=F, xlim=c(65,85))
lines(seq(min(mf$`C3 CSA`), max(mf$`C3 CSA`), length.out=200), dnorm(seq(min(mf$`C3 CSA`), max(mf$`C3 CSA`), length.out=200), mean = mean(mf$`C3 CSA`), sd = sd(mf$`C3 CSA`)))
abline(v=Zinterval, lty=2)
abline(v=Tinterval, lty=2, col='red')
3.5 Evaluating confidence intervals
- Review assumptions of our three intervals so far.
- Weaken the assumptions a little bit more.
- Assess confidence intervals using simulations.
3.5.1 Checking assumptions
What assumptions have we made for what reason?
- (One is kind of silly)
We can relax two of these at the expense of having approximate intervals.
3.5.2 Checking assumptions: Q-Q plot
- A Q-Q plot compares the distribution of the observed data (sample quantiles) compared to the theoretical distribution (here a standard normal)
- Rank the data and normalize
- \(y_i = (x_{(i)}/ - \bar x)/{\tilde \sigma}\)
- Plot is
x=qnorm(i/n)
,y=y_i
. - If data are normal the points will be on the line
- Check out more details on Wikipedia
- These data are surprisingly close to a normal distribution!
3.5.3 Review of what we just did so far
- We identified a parameter of interest (population mean of C3 CSA).
- We made some assumptions about the data.
- We identified three Frequentist confidence intervals.
3.5.4 Aside: What is the interpretation of these intervals?
\[ \mathbb{P}(\hat \mu - z_{1-\alpha/2} \times \hat\sigma/\sqrt{n} < \mu < \hat \mu + z_{1-\alpha/2} \times \hat\sigma/\sqrt{n}) \approx 1-\alpha \] …this interpretation is all of Frequentist statistics.
3.5.4.1 Relaxing assumptions (normality and identicality)
- There’s one theorem we mentioned in class last week that can relax both of these assumptions
This suggests that we can basically reduce our assumptions to independence and finite variance (and the extra condition in the theorem).
Then we can use this interval a lot of the time: \[ \mathbb{P}(\hat \mu - z_{1-\alpha/2} \times \hat\sigma/\sqrt{n} < \mu < \hat \mu + z_{1-\alpha/2} \times \hat\sigma/\sqrt{n}) \approx 1-\alpha \]
This depends on 2 approximations:
- Large sample (asymptotic) normality of the mean
- “plugging-in” \(\hat\sigma\) for the unknown parameter \(\sigma\).
In practice, it seems that it’s always safer to use the T- distribution over the normal distribution for confidence intervals (more on this later)
3.6 Evaluating confidence intervals under assumption violations
Today we will:
- Estimate a confidence interval in the C3 cross sectional data
- Run simulations to assess performance of confidence intervals
- You can use this code to run your simulations for homework 3
- Check how sensitive our confidence intervals are to assumptions
- Define some confidence intervals for the binomial proportion
3.6.1 Confidence intervals for the mean in the spinal cord data set
##
## One Sample t-test
##
## data: mf$`C3 CSA`
## t = 81.738, df = 90, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 74.56922 78.28441
## sample estimates:
## mean of x
## 76.42682
## [1] 76.42682
## [1] 90
## [1] 74.56922 78.28441
3.6.2 Two types of confidence intervals
\[ \begin{align*} \hat \mu \pm z_{\alpha/2} \times \sigma/\sqrt{n} \text{ (don't know sigma)} \\ \hat \mu \pm z_{\alpha/2} \times \hat\sigma/\sqrt{n} \\ \hat \mu \pm t_{n-1,\alpha/2} \times \hat\sigma/\sqrt{n}. \end{align*} \]
This interval \(\hat \mu \pm t_{n-1,\alpha/2} \times \hat\sigma/\sqrt{n}\) is approximate/exact. What are the assumptions?
This interval \(\hat \mu \pm z_{\alpha/2} \times \hat\sigma/\sqrt{n}\) is approximate/exact. Assuming you are using the central limit theorem, the assumptions are
3.6.3 Simulation setup
- We run simulations to evaluate the confidence intervals in finite samples.
- What are the metrics we care about, here?
- Coverage – an \(\alpha\) level Frequentist interval should contain the true parameter at least \(1-\alpha\) percent of the time.
- Width – a smaller interval (that has the right coverage) is better.
- How do we setup the simulations?
- We have a few variables we can set: \(n\), \(\mu\), \(\sigma^2\), \(\alpha\), number of simulations.
- Need a way to sample the data \(X_i\) given \(n\), \(\mu\), and \(\sigma^2\).
- Need a function to construct the CIs given the sample.
- Need a function to assess the coverage and width of the confidence intervals.
- Need a way to assess violations of normality.
- Need to do simulations for some different values of some of the parameters.
- Need to create some nice looking plots for the results.
Code
mu = 1
sigma = 1
ns = (1:10) * 10
nsim = 5000
alpha = 0.05
# to simulate the null in the ECT data
# ect = read.csv('../datasets/ect/ect.csv')
# ECT = ect$cgi.i
simResults = data.frame(n = ns)
simResults[ , c('Zexact_coverage', 'Zapprox_coverage', 'T_coverage', 'Zexact_width', 'Zapprox_width', 'T_width')] = NA
for(n in ns){
CIs = data.frame('mean' = rep(NA, nsim))
CIs[, paste(rep(c('Zexact', 'Zapprox', 'T'), each=2), rep(c('lower', 'upper'), 3), sep="_" )] = NA
for(sim in 1:nsim){
# draw a random sample, normal or skewed
X = mu + (rgamma(n, shape = 1.2, rate = 5) - 1.2/5)/sqrt(1.2)*5 # rnorm(n, sd=sigma)# + (rgamma(n, shape = 1.2, rate = 5) - 1.2/5)/sqrt(1.2)*5 # rnorm(n, sd=sigma) # # sample(n, fakeECT, replace=TRUE)
stdev = sd(X)
# construct the Zexact confidence interval
CIs$mean[sim] = mean(X)
CIs[sim, c('Zexact_lower', 'Zexact_upper') ] = c(mean(X) + c(-1,1)*qnorm(1-alpha/2)*sigma/sqrt(n))
# construct two more CIs
# Z approximate interval
CIs[sim, c('Zapprox_lower', 'Zapprox_upper') ] = c(mean(X) + c(-1,1)*qnorm(1-alpha/2)*stdev/sqrt(n))
# T-interval
CIs[sim, c('T_lower', 'T_upper') ] = c(mean(X) + c(-1,1)*qt(1-alpha/2, n-1)*stdev/sqrt(n))
}
CIs$Zexact_gotcha = ifelse(CIs$Zexact_lower<=mu & CIs$Zexact_upper>=mu, 1, 0)
CIs$Zapprox_gotcha = ifelse(CIs$Zapprox_lower<=mu & CIs$Zapprox_upper>=mu, 1, 0)
CIs$T_gotcha = ifelse(CIs$T_lower<=mu & CIs$T_upper>=mu, 1, 0)
# This computing the width of the confidence interval
CIs[paste(c('Zexact', 'Zapprox', 'T'), 'width', sep="_")] = CIs[, grep('_upper', names(CIs))] - CIs[, grep('_lower', names(CIs))]
simResults[which(simResults$n==n), grep('coverage', names(simResults), value = TRUE)] = colMeans(CIs[ , grep('_gotcha', names(CIs), value=TRUE) ] )
simResults[which(simResults$n==n), grep('width', names(simResults), value = TRUE)] = colMeans(CIs[ , grep('_width', names(CIs), value=TRUE) ] )
}
plot(simResults$n, simResults$Zexact_coverage, ylim=c(0.5, 1), type='b', ylab='Coverage (proportion)', xlab='n', main='CI coverage')
points(simResults$n, simResults$Zapprox_coverage, type='b', col='red')
points(simResults$n, simResults$T_coverage, type='b', col='blue')
abline(h=1-alpha, lty=2)
legend('bottomright', legend=c('Zexact', 'Zapprox', 'T'), col=c('black', 'red', 'blue'), bty='n', lty=2)
Code
plot(simResults$n, simResults$Zexact_width, type='b', ylim=c(0, 2), ylab='CI width', xlab='n', main='CI width')
points(simResults$n, simResults$Zapprox_width, type='b', col='red')
points(simResults$n, simResults$T_width, type='b', col='blue')
legend('topright', legend=c('Zexact', 'Zapprox', 'T'), col=c('black', 'red', 'blue'), bty='n', lty=2)
- Don’t interpret width without having proper coverage.
- These estimates are noisy, just like the bias/variance simulations. Use more to get better estimates.
- Rerun these simulations with the gamma errors. With errors from ECT data
3.7 Assumption violations with T-interval
- Another collaborator I have is evaluating the effect of ECT to treat depression and psychosis symptoms.
- For this study, my collaborator has reported CGI-I scores
- The value 4 represents “no change” and he wants to know whether the population mean is likely to be far from 4.
- What do we do when the assumptions of the T-test aren’t met?
3.8 Confidence intervals for binomial proportions
3.8.1 CIs using the central limit theorem
Remember the CLT:
The Central Limit Theorem (Durrett, pg. 124): Let \(X_1, X_2, \ldots\) be iid with \(\mathbb{E}X_i = \mu\) and \(\text{Var}(X_i) = \sigma^2 \in (0, \infty)\).
If \(\bar X_n = n^{-1} \sum_{i=1}^n X_i\), then \[ n^{1/2}(\bar X_n - \mu)/\sigma \to_D X, \] where \(X \sim N(0,1)\).
- The CLT allows us to make probabilistic statements about estimators that can be expressed as sums.
- Since it is an “asymptotic” approximation (infinite sample size), we don’t know how it works in finite samples.
- Most often, we use simulations to evaluate the finite sample approximation.
- For homework Problem 1a you will derive this interval, just like we do here.
\[ \mathbb{P}\left(\bar X_n - Z_{0.975}\times \sqrt{p(1-p)/n} < p < \bar X_n + Z_{0.975}\times \sqrt{p(1-p)/n}\right) \approx 0.95. \]
You will work with this interval for homework
3.8.2 Clopper-Pearson interval for the binomial proportion
- This is an exact interval (sort of), meaning the probability of coverage is not approximate.
- To me, it’s not intuitive why this works as a confidence interval (Clopper. Pearson).
Conditional on the value \(X_o \sim \text{Binom}(p)\) (treat it as nonrandom), define \(p_\ell(X_o)\) as the value that satisfies \[ 0.975 \ge \sum_{k=0}^{X_o-1}{ n \choose k} p_{\ell}^k(1-p_\ell)^{n-k} \] and \(p_u(X_o)\) as the value that satisfies \[ 0.025 \le \sum_{k=0}^{X_o}{ n \choose k} p_{u}^k(1-p_u)^{n-k} \]
3.8.3 Picture of this interval
Let’s say \(n=50\), \(\sum_i x_i = 30\), then choose \(p\) so that
Code
n=50
si = 30
alpha = 0.05
x = 0:n
plower = uniroot(function(p) pbinom(si, n, p) - (1-alpha/2), interval=c(0.1, 0.9))$root
pupper = uniroot(function(p) pbinom(si, n, p) - alpha/2, interval=c(0.1, 0.9))$root
yl = dbinom(x, size=n, prob=plower)
yu = dbinom(x, size=n, prob=pupper)
plot(x, yl, ylab='P(sumXi=x)', xlab='x', type='h', col='red')
abline(v=si, col='black', lty=2)
abline(v=plower*n, col='red', lty=2)
legend('topleft', legend=c('sumXi', 'lower CI', 'upper CI'), lty=2, col=c('black', 'red', 'blue'))
Code
Code
- Picture of the interval.
3.8.4 Beta distribution
The Beta distribution is a two parameter distribution \[ f(y; \alpha, \beta) = \frac{1}{B(\alpha, \beta)}y^{\alpha-1}(1-y)^{\beta-1}. \] Note its similarity to the binomial distribution \[ p^{\sum_i x_i}(1- p)^{n-\sum_i x_i} \]
It turns out that this duality implies that the boundaries for the Clopper-Pearson interval can be expressed in terms of quantiles of the beta distribution \[ \left(\text{qbeta}(\alpha/2, X, n-X +1), \text{qbeta}(1-\alpha/2, X+1, n-X) \right) \]
3.9 Extra stuff
Code
# sample size
ns = c(5, 10,50,100,200,500)
# correlation
rhos = c(0, -0.7)
# creating blank output table
results = expand.grid(rho=rhos , n=ns)
colNames = paste(rep(c("rhoHat"), each=3),
rep(c('Bias', 'Variance', 'MSE'), 1))
results[, colNames ] = NA
for(simInd in 1:nrow(results)){
rho = results[simInd,'rho']
n = results[simInd, 'n']
# generate data
Sigma = matrix(c(1, rho, rho, 1), nrow=2)
sqrtSigma = svd(Sigma)
sqrtSigma = sqrtSigma$u %*% diag(sqrt(sqrtSigma$d))
# each column is a draw of x and y
rhoHats = replicate(n = 10000, cor(tcrossprod(matrix(rnorm(n*2),
ncol=2), sqrtSigma)))
# compute bias, variance, MSE of estimators
results[simInd, c('rhoHat Bias', 'rhoHat Variance')]=
c(mean(rhoHats[1,2,])-rho , var(rhoHats[1,2,]))
}
subres = results[results$rho== -0.7, ]
subres2 = results[results$rho== 0,]
plot(subres$n, subres$`rhoHat Bias`, xlab='Sample size', ylab='Bias',
main='Bias of rhoHat when rho = -0.7', type='b')