Group-level analysis tutorial

Code
knitr::opts_chunk$set(echo = TRUE, eval=TRUE, cache=FALSE)

Overview

We will analyze diagnostic and age associations in individuals with autism spectrum disorder and typically developing controls. The data we will analyze are called fractional amplitude of low frequency fluctuations fALFF. They are computed from rs-fMRI data using the following steps:

  1. The data are preprocessed as usual for rs-fMRI data.
  2. fALFF is the power in the spectrum 0-0.08Hz divided by the power in the spectrum 0-0.25Hz.
  3. fALFF is computed at every voxel in the image.

The fALFF data are available already processed from the Autism Brain Imaging Data Exchange (ABIDE) dataset in MNI space at 3mm resolution.

To analyze the data, we will look at associations of fALFF with diagnosis and age, controlling for other covariates.

\[ \text{fALFF}_i \sim \text{age}_i + \text{sex}_i + \text{diagnosis}_i + \text{motion covariates}_i + \text{site}_i \]

Code
# install the latest version of the NIsim package, which includes a function to download the preprocessed ABIDE data.
# devtools::install_github('statimagcoll/PCP', ref='master')
# devtools::install_github('statimagcoll/pbj', ref='master')

### LIBRARIES ###
library(RNifti)
library(parallel)
library(splines)
library(progress)
library(table1)
library(pbj)
library(PCP)

Outline of steps

  1. Download ABIDE data
  2. Data quality checks
    • Create study level brain mask
    • Motion exclusions
    • Covariate exclusions
  3. Group-level analysis
    • FEAT and easythresh
    • Randomise
    • pbj package in R
  4. Cluster extent inference
    • easythresh
    • pbj bootstrap procedure
  5. Reporting results

Download ABIDE data

The PCP R package has access to the preprocessed ABIDE data. Here, the code downloads fALFF images, which is a measure derived from the rs-fMRI data.

Code
## DATA LOCATION ##
# SMN data folder
datadir = file.path(dirname(dirname(getwd())), 'data/abide')
derivative='falff'
# will be created by downloadABIDE
dbimagedir = file.path(datadir, 'neuroimaging/cpac/filt_global', derivative)
templatefile = file.path(datadir, 'neuroimaging/MNI152_T1_3mm.nii.gz')
# created by simSetup... Not used in this analysis
dbresimagedir = file.path(datadir, 'neuroimaging/cpac/alff_res')
# created below
maskfile = file.path(datadir, 'neuroimaging/cpac/mask.nii.gz')
overlapfile = file.path(datadir, 'neuroimaging/cpac/overlap.nii.gz')

# load in data and get directories
dat = ABIDE(datadir, derivatives=derivative)
# some data curation is done on download. Subset all pass on visual inspection and making sex and dx factors, remove rows with no file name
dat$imgname = paste(dat$file_id, paste0(derivative, '.nii.gz'), sep='_')
dat$files = file.path(dbimagedir, dat$imgname)

Data quality assurance

Code
dat$sex = factor(dat$sex, levels=1:2, labels = c('Male', 'Female'))
label(dat$sex) = "Sex"
dat$dx_group = factor(toupper(dat$dx_group))
dat$site_id = factor(dat$site_id)
label(dat$age_at_scan) = "Age"
label(dat$func_mean_fd) = "Mean displacement"
label(dat$dx_group) = "Diagnosis"
table1(~ sex + func_mean_fd + age_at_scan | dx_group, data=dat, caption = "ABIDE analysis summary table.")
ABIDE analysis summary table.
ASD
(N=505)
HC
(N=530)
Overall
(N=1035)
Sex
Male 443 (87.7%) 435 (82.1%) 878 (84.8%)
Female 62 (12.3%) 95 (17.9%) 157 (15.2%)
Mean displacement
Mean (SD) 0.157 (0.188) 0.103 (0.0986) 0.129 (0.152)
Median [Min, Max] 0.0926 [0.0180, 1.43] 0.0718 [0.0161, 0.890] 0.0834 [0.0161, 1.43]
Missing 1 (0.2%) 0 (0%) 1 (0.1%)
Age
Mean (SD) 17.1 (8.55) 16.8 (7.45) 17.0 (8.00)
Median [Min, Max] 14.4 [7.00, 64.0] 14.7 [6.47, 56.2] 14.5 [6.47, 64.0]

Motion exclusions

Code
# plot motion
plot(dat$func_mean_fd, main="Mean relative frame displacement", bty='l')
# greater than half a millimeter
abline(h=0.5, lty=2, lwd=2)

Mean framewise displacement (FD) for each participant. The dashed line indicates the 0.5mm threshold we use to exclude participants with high motion.
Code
# add number of exclusions as text in the top right corner of plot

# subset to participants less than half a milimeter mean framewise displacement
dat = dat[which(dat$func_mean_fd<0.25), ]

Covariate exclusions

This is just a check to make sure we don’t have any missing non-imaging variables. One additional participant is excluded.

Code
dat = dat[rowMeans(!is.na(dat[,c("sex", "age_at_scan", "func_mean_fd", "func_fber", "func_outlier", "dx_group", "site_id")]))==1,]

Create study level brain mask

After downloading the data, we create a study specific mask based on the subset of the data where all subjects have coverage. We make a small number of exclusions to increase the coverage above 30,000 voxels at 3mm isotropic resolution. This first chunk of code creates a study-level mask by the intersection of all participants where their data are non-zero. It the iteratively excludes the participant that will increase the study-level mask, until we reach a coverage over 30,000 voxels.

Code
imgs = simplify2array(readNifti(dat$files) )
overlap = rowMeans(imgs!=0, dims=3)
overlapImg = readNifti(dat$files[1])
overlapImg[,,] = overlap[,,]
writeNifti(overlapImg, file=overlapfile)

# choose
# number of people with zeros at that location
inds=numeric()
ids = c()
subids = dat$sub_id
# number of voxels with no zeros
nnzero = 0
# iteratively removes subjects who will increase the mask the largest
while(nnzero<30000){
  voxSums = rowSums(imgs==0, dims=3)
  tab = as.data.frame(table(voxSums))
  nnzero=tab[1,2]
  # number of unique voxels for each subject
  uniquevox = apply(imgs, 4, function(img) sum(img==0 & voxSums==1) )
 
  inds = which.max(uniquevox)
  cat('\nIteration:\nsubject removed: ', paste(subids[inds], collapse=', '), '\nmask size is now ', nnzero+sum(uniquevox[inds]), '\nNumber of voxels added:', sum(uniquevox[inds]) )
  imgs = imgs[,,,-inds]
  subids = subids[-inds]
  nnzero=nnzero+sum(uniquevox[inds])
}

Iteration:
subject removed:  51469 
mask size is now  8746 
Number of voxels added: 5402
Iteration:
subject removed:  51478 
mask size is now  23062 
Number of voxels added: 14316
Iteration:
subject removed:  50209 
mask size is now  25587 
Number of voxels added: 2525
Iteration:
subject removed:  50727 
mask size is now  27681 
Number of voxels added: 2094
Iteration:
subject removed:  50195 
mask size is now  28677 
Number of voxels added: 996
Iteration:
subject removed:  50216 
mask size is now  29426 
Number of voxels added: 749
Iteration:
subject removed:  51466 
mask size is now  30010 
Number of voxels added: 584

Create the mask from the overlap.

Code
badDat = dat[! dat$sub_id %in% subids,]
dat = dat[ dat$sub_id %in% subids,]

# now create the mask
mask = readNifti(dat$files[1])
imgs = simplify2array(readNifti(dat$files) )
# mask actually still has pretty low coverage
mask[,,] = 0
mask[,,] = apply(imgs>0, 1:3, all)
writeNifti(mask, maskfile)
nvox = sum(mask)
ulimit = quantile(apply(imgs, 4, function(x) x[mask!=0]), 0.9)
rm(imgs)

Subjects with good coverage

Visualize a slice of the data for all participants included in the analysis.

Subjects with bad coverage

There are issues with the orientation of the images for a small number of participants. We can visualize them here to see what the problem is.

Code
niftis = readNifti(badDat$files)
# visualize the ones that had bad coverage
par(mfrow=c(2,5), mar=c(0,0,2,0))
for(i in 1:length(niftis)){
  image(niftis[[i]], index=45, lo=FALSE, limits=c(0,ulimit), crop=FALSE)
  mtext(badDat$sub_id[i], outer=FALSE)
}

Axial slice (z=45) of fALFF data for all participants excluded from the analysis due to poor coverage.

Model estimation

Code
analysisDir = "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results"
# file.path(getwd(), "/4group-level_analysis/group-level_tutorial/abide_results")
# need age_at_scan in both models for testing nonlinear functions
form = paste0(" ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + dx_group + site_id" )
formred = paste0(" ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + site_id")
formredAge = paste0(" ~ sex + dx_group + func_mean_fd + func_fber + func_outlier + site_id")
mask = maskfile
ncores = 4

fsl_glm: OLS regression

We can use FSL’s FEAT to do a group-level analysis using a simple GLM (least squares approach). Here we create a text file with the list of images, and text files with the design matrix and contrasts for diagnosis and age.

Code
# feat setup files directory
featDir = file.path(analysisDir, 'feat')
dir.create(featDir, recursive=TRUE, showWarnings = FALSE)

# write text file with all images
write.table(dat$files, file = file.path(featDir, 'images.txt'), row.names=FALSE, col.names=FALSE, quote=FALSE)

# create design file for feat
# save contrast names
# create design matrix without the intercept for feat and save it as a text file
design = model.matrix(as.formula(form), data=dat)
write.table(design, file=file.path(featDir, 'design.txt'), row.names=FALSE, col.names=FALSE, quote=FALSE)

# create contrasts for dx and age
contrasts = matrix(0, nrow=2, ncol=ncol(design))
colnames(contrasts) = colnames(design)
rownames(contrasts) = c('dx', 'age')
contrasts[1, which(colnames(contrasts)=='dx_groupHC')] = 1
contrasts[2, which(colnames(contrasts)=='age_at_scan')] = 1
# save contrasts
write.table(contrasts, file=file.path(featDir, 'contrasts.txt'), row.names=FALSE, col.names=FALSE, quote=FALSE)
getwd()
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial"
Code
# output directory
featDir=$(pwd)/abide_results/feat
# merge images to 4d file
fslmerge -t $featDir/falff_outcome.nii.gz $(cat $featDir/images.txt)
# convert text file to .mat format for fsl
Text2Vest $featDir/design.txt $featDir/design.mat
# convert contrast text file to fsl format
Text2Vest $featDir/contrasts.txt $featDir/design.con
# mask file
mask=$(pwd)/../../data/abide/neuroimaging/cpac/mask.nii.gz
# run glm command
fsl_glm -i $featDir/falff_outcome.nii.gz -d $featDir/design.mat -c $featDir/design.con -m $mask \
-o $featDir/pe.nii.gz --out_z=$featDir/zstat.nii.gz  --out_t=$featDir/tstat.nii.gz --out_p=$featDir/pstat.nii.gz --out_cope=$featDir/cope.nii.gz

flameo: random effects model with subject-specific error variance

FSL’s FLAME model accounts for differences in variance among participants. To do this, it needs an estimate of the variances of the outcome images. We don’t have this for fALFF data, so we can’t fit the model here. It also can account for repeated measurements among participants. Might be helpful to review slides 12-16 in the group-level analysis notes.

randomise: permutation approach

Not included as part of this tutorial yet.

pbj package in R: robust variance estimation

First, replicate the GLM results from fsl. This fits OLS at each voxel, and computes the same Z-statistics as FSL’s FEAT.

  • HC3 is an option for the heteroskedasticity consistent covariance estimator that improves small sample performance. It defaults to TRUE.
  • transform=t indicates that we want to transform the test statistics from t to Z-statistics.
  • robust=FALSE indicates that we want to use the standard OLS covariance estimator, which assumes homoskedasticity. It defaults to TRUE.
Code
# DX analysis
abideDXglm = lmPBJ(dat$files, form=form, formred=formred, mask=mask, data=dat, template = templatefile, robust=FALSE, HC3 = FALSE, transform='t')
abideDXglm
Full formula:   ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + dx_group + site_id
Reduced formula:   ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + site_id  
30307  voxels in mask
StatMap quantiles (0, 0.01, 0.05, 0.95, 0.99, 1):
 [ 0, 0, 0.01, 6.01, 9.61, 18.77 ]
sqrtSigma: 
  [n =  914 ; df =  1 ; rdf =  888 ]
id variable is:
 NULL
lmPBJ inference settings:
  robust =  FALSE ; HC3 =  FALSE ; transform =  t
pbjInference not run yet.
Code
# effect size for diagnosis effect
write.statMap(abideDXglm, outdir=file.path(analysisDir, 'dxGLM'), statMethod = 'Z')
Writing stat and coef images.
Writing sqrtSigma object.
$stat
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/dxGLM/stat_z.nii.gz"

$sqrtSigma
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/dxGLM/sqrtSigma.rds"

This analysis uses the default settings, which includes a robust covariance estimator, which accommodates heteroskedasticity. Using transform=t converts the test statistics using the inverse CDF of the T-distribution and CDF of the standard normal. For hypothesis testing, this helps to make them more normal, especially for extreme values.

Code
# DX analysis
abideDX = lmPBJ(dat$files, form=form, formred=formred, mask=mask, data=dat, template = templatefile, transform='t')
abideDX
Full formula:   ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + dx_group + site_id
Reduced formula:   ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + site_id  
30307  voxels in mask
StatMap quantiles (0, 0.01, 0.05, 0.95, 0.99, 1):
 [ 0, 0, 0.01, 5.71, 9.13, 18.86 ]
sqrtSigma: 
  [n =  914 ; df =  1 ; rdf =  888 ]
id variable is:
 NULL
lmPBJ inference settings:
  robust =  TRUE ; HC3 =  TRUE ; transform =  t
pbjInference not run yet.
Code
image(abideDX, cft_p=0.01, crop=TRUE, index = 18:47)

Code
write.statMap(abideDX, outdir=file.path(analysisDir, 'dx'), statMethod = 'Z')
Writing stat and coef images.
Writing sqrtSigma object.
$stat
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/dx/stat_z.nii.gz"

$sqrtSigma
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/dx/sqrtSigma.rds"
Code
write.statMap(abideDX, outdir=file.path(analysisDir, 'dx'), statMethod = 'S')
Writing stat and coef images.
Writing sqrtSigma object.
$stat
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/dx/stat_s.nii.gz"

$sqrtSigma
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/dx/sqrtSigma.rds"

Load interactively with fsleyes. Open also with MNI 1mm template for better resolution viewing. fsleyes can overlay them correctly.

Code
fsleyes /Users/vandeks/Library/CloudStorage/Box-Box/SMN/data/abide/neuroimaging/MNI152_T1_3mm.nii.gz /Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/dx/stat_z.nii.gz

The same approach but testing the age effect.

Code
# Age analysis
abideAge = lmPBJ(dat$files, form=form, formred=formredAge, mask=mask, data=dat, template = templatefile, transform='t')
Code
image(abideAge, cft_p=0.05, crop=TRUE, index = 18:47)

Code
write.statMap(abideAge, outdir=file.path(analysisDir, 'age'), statMethod = 'Z')
Writing stat and coef images.
Writing sqrtSigma object.
$stat
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/age/stat_z.nii.gz"

$sqrtSigma
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/age/sqrtSigma.rds"
Code
write.statMap(abideAge, outdir=file.path(analysisDir, 'age'), statMethod = 'S')
Writing stat and coef images.
Writing sqrtSigma object.
$stat
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/age/stat_s.nii.gz"

$sqrtSigma
[1] "/Users/vandeks/Library/CloudStorage/Box-Box/SMN/4group-level_analysis/group-level_tutorial/abide_results/age/sqrtSigma.rds"

Last model fitting the intercept after subtracting the mean out of each image. This will show regions that tend to have higher in fALFF (relative to the whole brain average) that are consistent across all participants.

Code
# intercept test
formInt = " ~ 1"
formredInt = " ~ 0"
abideInt = lmPBJ(dat$files, form=formInt, formred=formredInt, mask=mask, data=dat, template = templatefile, transform='t')
Code
image(abideInt, cft_p=0.05, crop=TRUE, index = 18:47)
write.statMap(abideAge, outdir=file.path(analysisDir, 'age'), statMethod = 'Z')
write.statMap(abideAge, outdir=file.path(analysisDir, 'age'), statMethod = 'S')

Cluster extent inference

Using easythresh

easythresh is a command line tool from FSL to perform GRF-based cluster extent inference. It takes as input a Z-statistic image, a brain mask, a cluster-forming threshold (CFT), a cluster extent threshold (CET), and a template image for MNI space. It outputs thresholded images and a text file with cluster extent inference results. Here are two code chunks applying the easythresh command to the FSL FEAT results and the pbj OLS results.

‘easythresh’ applied to FSL output

Code
# output directory
featDir=$(pwd)/abide_results/feat
# cluster forming threshold p<0.01. Note: NOT recommended based on prior literature showing it won't control the type 1 error rate
cft=3.1
# cluster extent threshold p<0.5. I set this anti-conservatively to include extra results for us to look at. easythresh will only include significant clusters in the output
cet=0.5
# split the zstat image into dx and age
fslsplit $featDir/zstat.nii.gz $featDir/zstat_ -t
mask=/Users/vandeks/Library/CloudStorage/Box-Box/SMN/data/abide/neuroimaging/cpac/mask.nii.gz
template=/Users/vandeks/Library/CloudStorage/Box-Box/SMN/data/abide/neuroimaging/MNI152_T1_3mm.nii.gz
# output directory
easyOutDir=$featDir/easythresh_cft${cft}_cet${cet}
mkdir $easyOutDir 2>/dev/null

# run easythresh command
# cd to the output directory -- easythresh writes output to current directory
curDir=$(pwd)
cd $easyOutDir
easythresh $featDir/zstat_0000.nii.gz $mask $cft $cet $template 
cd $curDur

‘easythresh’ applied to pbj output

Code
pbjDir=$(pwd)/abide_results/dxGLM

# other variables are defined above

# output directory
easyOutDir=$pbjDir/easythresh_cft${cft}_cet${cet}
mkdir $easyOutDir 2>/dev/null

# run easythresh command
# cd to the output directory -- easythresh writes output to current directory
curDir=$(pwd)
cd $easyOutDir
# run the command. Last argument is suffix for output images
easythresh $pbjDir/stat_z.nii.gz $mask $cft $cet $template easythresh
cd $curDur

Two tables of the output from easythresh using FSL analysis and pbj analysis as input. Results are for the diagnostic effect. They should be identical because they are performing the exact same analysis.

Code
# read in text file cluster_easythresh.txt and print as table using knitr
easythreshFile = file.path(featDir, 'easythresh_cft3.1_cet0.5', 'cluster_easythresh.txt')
easythreshTab = read.table(easythreshFile, header=TRUE, sep='\t', check.names = FALSE)
knitr::kable(easythreshTab, caption = "Cluster extent inference results from FSL's easythresh command.")
Cluster extent inference results from FSL’s easythresh command.
Cluster Index Voxels P -log10(P) MAX MAX X (vox) MAX Y (vox) MAX Z (vox) COG X (vox) COG Y (vox) COG Z (vox)
8 53 0.0717 1.140 4.19 22 22 36 23.7 20.8 36.5
7 33 0.1820 0.741 4.10 32 36 28 32.8 37.2 27.6
6 30 0.2100 0.677 3.79 22 32 42 20.2 34.0 40.5
5 16 0.4260 0.370 4.20 38 30 43 38.3 29.9 42.7
4 16 0.4260 0.370 3.93 36 46 27 36.3 46.2 28.2
3 15 0.4490 0.348 3.80 22 22 40 22.6 22.5 40.6
2 13 0.4970 0.303 3.56 29 32 43 30.7 32.0 42.6
1 13 0.4970 0.303 4.33 23 14 30 23.2 14.4 29.9
Code
pbjDir = file.path(analysisDir, 'dxGLM')
# read in text file cluster_easythresh.txt and print as table using knitr
easythreshFile = file.path(pbjDir, 'easythresh_cft3.1_cet0.5', 'cluster_easythresh.txt')
easythreshTab = read.table(easythreshFile, header=TRUE, sep='\t', check.names = FALSE)
knitr::kable(easythreshTab, caption = "Cluster extent inference results from FSL's easythresh command applied to the PBJ OLS results.")
Cluster extent inference results from FSL’s easythresh command applied to the PBJ OLS results.
Cluster Index Voxels P -log10(P) MAX MAX X (vox) MAX Y (vox) MAX Z (vox) COG X (vox) COG Y (vox) COG Z (vox)
8 53 0.0717 1.140 4.19 22 22 36 23.7 20.8 36.5
7 33 0.1820 0.741 4.10 32 36 28 32.8 37.2 27.6
6 30 0.2100 0.677 3.79 22 32 42 20.2 34.0 40.5
5 16 0.4260 0.370 4.20 38 30 43 38.3 29.9 42.7
4 16 0.4260 0.370 3.93 36 46 27 36.3 46.2 28.2
3 15 0.4490 0.348 3.80 22 22 40 22.6 22.5 40.6
2 13 0.4970 0.303 3.56 29 32 43 30.7 32.0 42.6
1 13 0.4970 0.303 4.33 23 14 30 23.2 14.4 29.9

Using the pbj bootstrap procedure

This uses a parametric bootstrap procedure to estimate the distribution of the maximum cluster size under the null hypothesis. It is semiparametric, in the sense that the robust covariance estimator allows misspecification of the spatial covariance structure. I’ve used a small number of bootstraps, here (500). In practice, I recommend 2000-5000. I’ve used a \(p\)-value cluster forming threshold (CTF) of 0.01. The code also has the option to specify the CFT in terms of the effect size or the chi-square statistic. The rdata_rds argument saves the bootstrap results to an RDS file, so you can load them later without re-running the bootstrap. When you pass this argument, it runs the analyses in the background, so you can go about other tasks in your Rstudio terminal. The status object will let you know when the bootstrap is done.

The default method for pbjInference is to do CEI, but you can also compute \(p\)-values for local maxima, do FWER control, or specify a custom statistic, so it’s really flexible, but you have to know what you’re doing.

Parametric analysis with bootstrap inference

This is using the parametric bootstrap to do inference for the diagnostic effect in the OLS model.

Code
rds = file.path(analysisDir, 'dxGLM', 'pbjBoot.rds')

Using 500 bootstraps with an equivalent cluster-forming threshold of Z>3.1 (p<0.001) on the chi-square scale.

Code
# bootstrap with 500 resamples
set.seed(123)
nboot=500
# same as z>3.1 or p<0.001
status = pbjInference(abideDXglm, nboot=nboot, rdata_rds=rds, cft_chisq = 3.1^2)

This uses the standard visualizations from the pbj package to show the results. The cluster extents and centroids should be the same as in the easythresh results above, since they are performing the same analysis, but the FWER p-value will be different because pbj uses the bootstrap procedure to compute p-values. Instead of reporting a max Z-statistic, the pbj package outputs the maximum robust effect size index (RESI), which is a standardized effect size index that is more comparable across different studies Vandekar and Stephens, 2021. The \(p\)-values are pretty similar to the results in easythresh, but not identical because of the different ways of estimating the null distribution of the maximum cluster size. Likely, they are similar because the assumptions of easythresh are valid at this high cluster forming threshold.

Code
# load in results
abideDXglm = readRDS(rds)
abideDXglm
Full formula:   ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + dx_group + site_id
Reduced formula:   ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + site_id  
30010  voxels in mask
StatMap quantiles (0, 0.01, 0.05, 0.95, 0.99, 1):
 [ 0, 0, 0.01, 6.05, 9.62, 18.77 ]
sqrtSigma: 
  [n =  914 ; df =  1 ; rdf =  888 ]
id variable is:
 NULL
lmPBJ inference settings:
  robust =  FALSE ; HC3 =  FALSE ; transform =  t
pbjInference results:
  CEI run with cft =  9.61
Code
image(abideDXglm, cft_chisq = 3.1^2)
tabout = table.statMap(abideDXglm, cft_chisq=3.1^2)[1:10,]
tabout[, 'easythresh p-val'] = NA
# hard-coded. Only got the 8 largest clusters from easythresh
tabout[1:8, 'easythresh p-val'] = easythreshTab$P
# move Max RESI column to the back of tabout
tabout = tabout[, c(setdiff(colnames(tabout), 'Max RESI'), 'Max RESI')]
knitr::kable(tabout, caption = "Ten largest clusters from the parametric bootstrap procedure applied to OLS model in the pbj package compared to the easythresh p-values.")
Ten largest clusters from the parametric bootstrap procedure applied to OLS model in the pbj package compared to the easythresh p-values.
Cluster ID Cluster Extent Centroid (vox) Unadjusted p-value FWER p-value easythresh p-val Max RESI
8 8 53 25, 22, 37 0.0042087 0.072 0.0717 0.1345862
24 24 33 34, 38, 29 0.0100153 0.128 0.1820 0.1313660
2 2 30 21, 35, 41 0.0120096 0.146 0.2100 0.1210006
4 4 16 39, 31, 44 0.0377485 0.282 0.4260 0.1347982
21 21 16 37, 47, 29 0.0377485 0.282 0.4260 0.1255713
5 5 15 24, 24, 42 0.0422093 0.304 0.4490 0.1212052
3 3 13 32, 33, 44 0.0544400 0.366 0.4970 0.1130873
23 23 13 24, 15, 31 0.0544400 0.366 0.4970 0.1394304
10 10 12 44, 36, 38 0.0608202 0.394 NA 0.1121541
11 11 11 37, 54, 37 0.0695590 0.424 NA 0.1292112

Cluster extent inference results from the parametric bootstrap procedure in the pbj package.

Code
#inferenceIndex(abideDX$pbj$obsStat, method='cei')

Cluster extent inference results from the parametric bootstrap procedure in the pbj package.

Semiparametric bootstrap inference with pbj

This uses the “robust” covariance approach. Now I’m running it at z>3.1, which is required by easythresh, but the bootstrap procedure is more flexible, so we can use z>2.32 as a threshold as well.

Code
rds = file.path(analysisDir, 'dx', 'pbjBoot.rds')
Code
# bootstrap with 500 resamples
set.seed(123)
nboot=500
status = pbjInference(abideDX, nboot=nboot, rdata_rds=rds, cft_chisq = c(3.1, 2.32)^2)
Code
# load in results
abideDX = readRDS(rds)
abideDX
Full formula:   ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + dx_group + site_id
Reduced formula:   ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + site_id  
30010  voxels in mask
StatMap quantiles (0, 0.01, 0.05, 0.95, 0.99, 1):
 [ 0, 0, 0.01, 5.75, 9.19, 18.86 ]
sqrtSigma: 
  [n =  914 ; df =  1 ; rdf =  888 ]
id variable is:
 NULL
lmPBJ inference settings:
  robust =  TRUE ; HC3 =  TRUE ; transform =  t
pbjInference results:
  CEI run with cft =  9.61, 5.38
Code
image(abideDX, cft_chisq=2.32^2)
knitr::kable(table.statMap(abideDX, cft_chisq=2.32^2)[1:10,], caption= "Ten largest clusters from the semiparametric bootstrap procedure applied to robust model in the pbj package.")
Ten largest clusters from the semiparametric bootstrap procedure applied to robust model in the pbj package.
Cluster ID Cluster Extent Centroid (vox) Unadjusted p-value FWER p-value Max RESI
1 1 710 29, 30, 36 0.0006987 0.036 0.1336704
6 6 187 42, 33, 40 0.0047842 0.182 0.1292265
35 35 145 38, 49, 30 0.0066836 0.228 0.1232188
2 2 123 20, 35, 41 0.0083315 0.266 0.1190937
55 55 92 25, 20, 23 0.0125374 0.348 0.0975146
15 15 84 37, 17, 33 0.0143496 0.376 0.1148660
48 48 83 40, 20, 24 0.0147334 0.384 0.1088026
16 16 42 37, 54, 37 0.0365157 0.618 0.1262520
21 21 38 50, 41, 35 0.0421439 0.660 0.1038552
40 40 33 21, 39, 29 0.0513619 0.708 0.0947552

Cluster extent inference results from the semiparametric bootstrap procedure in the pbj package.

Code
#inferenceIndex(abideDX$pbj$obsStat, method='cei')

Cluster extent inference results from the semiparametric bootstrap procedure in the pbj package.
Code
# load in results
abideDX = readRDS(rds)
abideDX
Full formula:   ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + dx_group + site_id
Reduced formula:   ~ sex + age_at_scan + func_mean_fd + func_fber + func_outlier + site_id  
30010  voxels in mask
StatMap quantiles (0, 0.01, 0.05, 0.95, 0.99, 1):
 [ 0, 0, 0.01, 5.75, 9.19, 18.86 ]
sqrtSigma: 
  [n =  914 ; df =  1 ; rdf =  888 ]
id variable is:
 NULL
lmPBJ inference settings:
  robust =  TRUE ; HC3 =  TRUE ; transform =  t
pbjInference results:
  CEI run with cft =  9.61, 5.38
Code
image(abideDX, cft_chisq=2.32^2)
knitr::kable(table.statMap(abideDX, cft_chisq=2.32^2)[1:10,], caption= "Ten largest clusters from the semiparametric bootstrap procedure applied to robust model in the pbj package using a CFT of Z>2.32.")
Ten largest clusters from the semiparametric bootstrap procedure applied to robust model in the pbj package using a CFT of Z>2.32.
Cluster ID Cluster Extent Centroid (vox) Unadjusted p-value FWER p-value Max RESI
1 1 710 29, 30, 36 0.0006987 0.036 0.1336704
6 6 187 42, 33, 40 0.0047842 0.182 0.1292265
35 35 145 38, 49, 30 0.0066836 0.228 0.1232188
2 2 123 20, 35, 41 0.0083315 0.266 0.1190937
55 55 92 25, 20, 23 0.0125374 0.348 0.0975146
15 15 84 37, 17, 33 0.0143496 0.376 0.1148660
48 48 83 40, 20, 24 0.0147334 0.384 0.1088026
16 16 42 37, 54, 37 0.0365157 0.618 0.1262520
21 21 38 50, 41, 35 0.0421439 0.660 0.1038552
40 40 33 21, 39, 29 0.0513619 0.708 0.0947552

Cluster extent inference results from the semiparametric bootstrap procedure in the pbj package.

Code
#inferenceIndex(abideDX$pbj$obsStat, method='cei')

Cluster extent inference results from the semiparametric bootstrap procedure in the pbj package.

Reporting results

Methods

We used open-access data from the Autism Brain Imaging Data Exchange (ABIDE) to analyze associations of fractional amplitude of low frequency fluctuations (fALFF) with diagnosis and age. fALFF quantifies the strength of spontaneous brain activity during rest. resting state functional magnetic resonance imaging (rs-fMRI) data were preprocessed using the CPAC pipeline, and fALFF images were computed as the ratio of power in the 0-0.08Hz band to the power in the 0-0.25Hz band at each voxel. Further details of preprocessing are available on the preprocessed connectomes project website (http://preprocessed-connectomes-project.org/abide/).

fALFF images were analyzed using voxel-wise linear models to assess associations with diagnosis (autism spectrum disorder vs. typically developing controls) and age, controlling for sex, mean relative framewise displacement, foreground-to-background energy ratio, mean number of outliers found in each volume, and site. Group-level analyses were performed using the pbj package in R, which implements robust variance estimation to accommodate unknown heteroskedasticity in the imaging data Vandekar et. al. 2019. Cluster extent inference (CEI) was performed using a cluster-forming threshold of \(Z>2.32\). A parametric bootstrap procedure to estimate the distribution of the maximum cluster size using 500 bootstrap samples.

Results

Classic visualization of the results includes:

  • Z-statistic images thresholded at the cluster forming threshold \(p<0.01\) uncorrected
  • Tables of the largest clusters from cluster extent inference, including their FWER \(p\)-values.
  • Plot of the effect size within the significant clusters.
Code
clusterTable = table.statMap(abideDX, cft_chisq=2.32^2)
knitr::kable(clusterTable[1:10,c(-1, -4)], caption= "Cluster-extent inference results using a CFT of Z>2.32.", row.names = FALSE)
Cluster-extent inference results using a CFT of Z>2.32.
Cluster Extent Centroid (vox) FWER p-value Max RESI
710 29, 30, 36 0.036 0.1336704
187 42, 33, 40 0.182 0.1292265
145 38, 49, 30 0.228 0.1232188
123 20, 35, 41 0.266 0.1190937
92 25, 20, 23 0.348 0.0975146
84 37, 17, 33 0.376 0.1148660
83 40, 20, 24 0.384 0.1088026
42 37, 54, 37 0.618 0.1262520
38 50, 41, 35 0.660 0.1038552
33 21, 39, 29 0.708 0.0947552

This is an example of the brain visualization. You would typically include colorbars, hemisphere label (L/R), and slice labels in a publication figure, but the package doesn’t have those features easily included at the moment. I would recommend saving the Z-statistic out and making the final figure in FSL or ITK-Snap.

Code
# get the Z-statistic map
zstat = stat.statMap(abideDX, method = 'Z')
# make it zero except in the significant ROI
# CEI 2 is the second threshold used to compute CEI inference. Here, it was 2.32^2
clusterMask = abideDX$pbj$ROIs$`CEI 2`
clusterMask[ ! clusterMask %in% clusterTable$`Cluster ID`[clusterTable$`FWER p-value`<0.05] ] = 0
# multiply their image arrays
zstat[,,] = zstat[,,] * as.logical(clusterMask[,,])
# z-index chosen within range of first (largest) cluster
image(zstat, BGimg = templatefile, crop=TRUE, index=33:41)

Z-statistic map for a significant cluster found in the posterior cingulate cortex and precuneus.
Code
# plot effect size in cluster 1
out = plotData.statMap(abideDX, emForm = "~ dx_group", cft_chisq=2.32^2, roiInds=1, statMethod = 'Z')

# plot fALFF by diagnosis as a scatter plot and overlay with emmeans estimates and confidence intervals
EM = as.data.frame(out$emmeans[[1]])
plot(jitter(as.numeric(out$data$dx_group), factor=1.5),
     EM[ match(out$data$dx_group, EM$dx_group), 'emmean'] + resid(out$models[[1]]),
     xlab='Diagnosis', ylab='fALFF', bty='l', xaxt='n', pch=16, col='lightgray', main='Cluster 1: fALFF by diagnosis')
axis(1, at=1:2, labels=levels(out$data$dx_group))
points(EM$emmean, pch=19)
arrows(x0=1:2, y0=EM$emmean, x1=1:2, y1=EM$lower.CL, angle=90, length=0.1)
arrows(x0=1:2, y0=EM$emmean, x1=1:2, y1=EM$upper.CL, angle=90, length=0.1)

fALFF in Cluster 1 comparing ASD and HC.