Common mistakes and challenges

Simon Vandekar

Objectives

These slides will cover some common mistakes and challenges analyzing neuroimaging data

Voodoo correlation

Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition

Observation of high correlations

  • “Eisenberger, Lieberman, and Williams (2003), writing in Science, described a game they created to expose individuals to social rejection in the laboratory. The authors measured the brain activity in 13 individuals at the same time as the actual rejection took place, and later obtained a self-report measure of how much distress the subject had experienced. Distress was correlated at r=.88 with activity in the anterior cingulate cortex (ACC).”

  • “In another Science paper, Singer et al. (2004) found that the magnitude of differential activation within the ACC and left insula induced by an empathy-related manipulation was correlated between .52 and .72 with two scales of emotional empathy (the Empathic Concern Scale of Davis, and the Balanced Emotional Empathy Scale of Mehrabian).”

  • “Writing in NeuroImage, Sander et al. (2005) reported that a subject’s proneness to anxiety reactions (as measured by an index of the Behavioral Inhibition System; Carver and White, 1994) correlated at r=.96 with the difference in activation of the right cuneus to attended versus ignored angry speech.”

Upper bound on the correlation

  • They claim an upper bound of the true correlation based on a measurement error model.
  • If X_{obs} = X + \epsilon_X and Y_{obs} = Y + \epsilon_Y, the correlation \rho_{X_{obs}, Y_{obs}} = \rho_{X, Y} * \sqrt{R_{X_{obs}} \times R_{Y_{obs}}}

Formula for reliability: R_{X_{obs}} = \frac{Var(X)}{ Var(X) + Var(\epsilon_X)}

  • Reliability of behavioral measures: “… therefore, a range of .7 - .8 would seem to be a somewhat optimistic estimate for the smaller and more ad hoc scales used in much of the research described below”
  • Reliability of fMRI: “fMRI measures computed at the voxel level will not often have reliabilities greater than about .7”

Observed correlations

  • They contacted many investigators to collect data across studies.
  • Called a meta-analysis Correlation Histogram

The explanation

  • Circularity analysis

Correlation Histogram

Illustration of circularity

Correlation Histogram

References

Circularity analysis

  • Consistent challenge in neuroimaging due to large amounts of data and complex processing.
  • Also called “double-dipping”
  • Slides today based on Kriegeskorte et al., 2009

Circularity definition

  • Circularity is reusing the same data for selection and inference
  • Can occur in mass-univariate and machine learning analysis
  • Related to Selection bias and leads to
    • Estimation bias (inflated effect sizes)
    • Inference bias (invalid p-values)

Kriegeskorte Example 1

  • Feature selection in ML (“pattern recognition”) models
  • What causes the inflated prediction accuracy when using random data?
  • How can this be avoided?
    • What are alternative feature selection strategies?

ML feature selection

Kriegeskorte Example 2

  • This is the same idea as the “Voodoo correlation” discussion above
  • Selection via valid inference procedures can lead to biased results in downstream analysis
  • It is very common to select regions to analyze based on “task active regions”.
    • Depending on how this is defined it could cause bias.
    • Would be another interesting project 😉.

Mass-univariate selection bias

Conditional probability interpretation

  • Statistically, the bias comes from \mathbb{E}[\hat{\beta}(v) | \text{prior steps}] > \mathbb{E}[\hat{\beta}(v) ]

Simon’s reviewing/editorial comments

  • It can be hard to tell in a paper whether a selection procedure causes bias
  • Selection procedures are very common in fMRI
  • E.g. selecting task-active regions, as I mentioned on previous slide
  • It can be very difficult to determine whether the selection produces bias
  • My recommendations:
    • Note as a reviewer that post hoc tests in selected regions are likely biased and ask them to qualitatively summarize the results with estimates instead of p-values/tests
    • Keep an eye out for circularity, and think about whether it could introduce bias
  • Meta-analysis or reanalysis examples showing effect-size inflation
  • Quantitative estimates of inflation
  • Relevance to current reproducibility discussions (Munafo et al., 2017)

Common types of circularity analysis

  • ROI definition circularity
    • Selected based on the data in some way
    • Probably bias, potentially does not have to be if the selection contrast is statistically independent of the target contrast.
  • Feature selection circularity as described above
  • Cross-validation leakage
    • Feature selection or dimension reduction applied in full dataset
  • Post-hoc region restriction
    • E.g. selecting via “task-activation” or another contrast
    • Might be ok, depends on the correlation between the contrasts.

Best practices

  • Data-splitting and independent validation sets
  • Cross-validation done correctly (nested CV)
  • Preregistered ROI definitions or selected from prior papers
  • Use of external templates or meta-analytic ROIs (could also be biased if using public data)

Statistical approaches

Won’t go into detail, here.

  • Selective inference frameworks (Taylor & Tibshirani, 2015; Lee et al., 2016)
  • Bayesian hierarchical modeling to share information without circularity
  • Resampling-based correction (bootstrap split-half)

Replicability

  • Seen as very important topic right now
  • Garden of forking paths
  • Carp, 2012
  • Related to circularity as preprocessing steps may be “optimized” to improve results (motion, smoothing, etc.)