## An introduction to functional alignment for fMRI analysis

Previously, we talked about task-based fMRI analysis. In this post, we want to introduce one of the long-standing challenges in task-based fMRI analysis.

## Introduction

Brain decoding, which is a conjunction between neuroscience and machine learning, extracts meaningful patterns (signatures) from neural activities of the human brain [1]. Most of the brain decoding approaches employed functional Magnetic Resonance Imaging (fMRI) technology for visualizing the brain activities because it can provide better spatial resolution in comparison with other imaging techniques [1—4].

fMRI can be used as a proxy to illustrate neural activities by analyzing the Blood Oxygen Level Dependent (BOLD) signals [1]. As one of the most popular supervised techniques in fMRI analysis, Multivariate Pattern (MVP) classification can map neural activities to distinctive brain tasks. MVP can generate a classification (cognitive) model, i.e., decision surfaces, in order to predict patterns associated with different cognitive states. This model can help us to figure out how the human brain works. MVP analysis has an extensive range of applications to seek novel treatments for mental diseases [1—6].

As a fundamental challenge in supervised fMRI studies, the generated MVP models must be generalized and validated across subjects. However, neuronal activities in a multi-subject fMRI dataset must be aligned to improve the performance of the final results. Technically, there are two different kinds of alignment techniques that can be used in harmony, i.e., anatomical alignment and functional alignment. The anatomical alignment as a general preprocessing step in fMRI analysis aligns the brain patterns by using anatomical features, which is extracted from structural MRI in the standard space — including Talairach or Montreal Neurological Institute (MNI). Nevertheless, the performance of anatomical alignment techniques is limited based on the shape, size, and spatial location of functional loci that differ across subjects. In contrast, the functional alignment can directly align the neural activities across subjects, which has been widely used in fMRI studies.

## Hyperalignment

Most of the recent studies in functional alignment [1—6] have used Hyperalignment (HA) [1]. As Figure 1 depicted, HA refers to the functional alignment of multi-subject data, where shared space is generated from neural activities across subjects. Then, the mapped features can be utilized by MVP techniques in order to boost the performance of the classification analysis. In practice, HA applies a Generalized Canonical Correlation Analysis (GCCA) approach (aka, multi-set CCA) to temporally-aligned neural activities across subjects, where a unique time point must represent the same simulation for all subjects [6].

Let S be the number of subjects, V be the number of voxels (viewed as a 1D vector, even though it corresponds to a 3D volume), and T is the number of time-points in units of Time of Repetitions (TRs). The preprocessed brain image (neural responses) for \ell\text{-}th subject is defined as \mathbf{X}^{(\ell)}\in\mathbb{R}^{V \times T}\text{, }\ell = 1\text{:}S. We consider \mathbf{X}^{(i)} \sim \mathcal{N}(0,1) is normalized by zero-mean and unit-variance in the preprocessing step. Here, \mathbf{X}^{(i)} \text{, } i=1\text{:}S is also time synchronized to provide temporal alignment — i.e., each time point demonstrates the same stimuli for all subjects [6]. In fact, the columns of \mathbf{X}^{(i)} are aligned across subjects by utilizing HA methods. The original HA can be defined as follows where tr() is the trace function [6]:

\underset{\mathbf{R}^{(i)}, \mathbf{G}}{\min} \sum_{i=1}^{S}\| \mathbf{X}^{(i)}\mathbf{R}^{(i)} - \mathbf{G} \|_F^2, \Big(\mathbf{X}^{(\ell)}\mathbf{R}^{(\ell)} \Big)^\top \mathbf{X}^{(\ell)}\mathbf{R}^{(\ell)} = \mathbf{I},

where \ell = 1\dots S, and \mathbf{G} denotes the common space such that:

\mathbf{G} = \sum_{j=1}^{S} \mathbf{X}^{(j)}\mathbf{R}^{(j)}.

In [6], Xu et al. proposed a regularized iterative approach for learning the common space (G) and the transformation matrices \mathbf{R}^{(i)}. Further, Lobert et al. developed a Kernel Hyperalignmenr for nonlinear fMRI analysis [5]. Recently, we also developed Deep Hyperalignment that can scale alignment techniques for large-scale analysis [4].

So, let’s look at the training and testing procedures. Based on the definition, we have a training set \mathbf{X}^{(\ell)}\in\mathbb{R}^{V \times T}\text{, }\ell = 1\text{:}S and a testing set \mathbf{\hat{X}}^{(\ell)}\in\mathbb{R}^{V \times T}\text{, }\ell = 1\text{:}\hat{S} — where \hat{S} is the number of subject in the testing set. In the training procedure, we first learn a common space \mathbf{G} and a set of transformation matrices \mathbf{R}^{(\ell)}\text{ for }\ell=1:S. Then, we generate a classification model by using \mathbf{X}^{(\ell)}\mathbf{R}^{(\ell)}\text{ for }\ell=1:S. In the testing stage, we learn the transformation matrices, where the shared space will not be updated anymore. We actually use following objective function:

\underset{\mathbf{\hat{R}}^{(i)}}{\min} \sum_{i=1}^{S}\| \mathbf{\hat{X}}^{(i)}\mathbf{\hat{R}}^{(i)} - \mathbf{G} \|_F^2,

Finally, the performance of the trained model can be evaluated by using the transformed testing features \mathbf{\hat{X}}^{(\ell)}\mathbf{\hat{R}}^{(\ell)}\text{ for }\ell=1:\hat{S}. This learning procedure is almost the same in most alignment techniques.

## Shared Response Model (SRM)

SRM is a probabilistic extension of the Hyperalignment [7] — i.e., SRM uses probabilistic CCA for generating the shared space (aka, common space). Let m be the number of subjects in a preprocessed fMRI dataset, d denotes the number of time points in TRs, and v is the number of voxels. fMRI time-series for i-th subject denotes by \mathbf{X}_{i} \in \mathbb{R}^{v \times d} for i=1:m. SRM’s objective function is to model each subject’s neural responses as \mathbf{X}_{i} = \mathbf{W}_{i}\mathbf{S} + \mathbf{E}_{i} , where \mathbf{W}_{i} \in \mathbb{R}^{v \times k} denotes a basis of topographies for subject i, k is the number of selected features, \mathbf{S} \in \mathbb{R}^{k \times d} is the shared space. SRM’s objective function for all subjects can be also written as follows: [7]

\underset{\mathbf{S,}\mathbf{W}_{i}}{\min} \sum_{i=1}^{m} \| \mathbf{X}_{i} - \mathbf{W}_{i}\mathbf{S} \|_F^2, \text{subject to } \mathbf{W}_{i}^{\top}\mathbf{W}_{i}=\mathbf{I}_k,

where \|. \|_F denotes the Frobenius norm, and \mathbf{I}_k is identity matrix in size k. Further, we calculate the shared space as follows:

\mathbf{S} = \frac{1}{m}\sum_{i=1}^{m} \mathbf{W}_{i}^{\top} \mathbf{X}_{i}.

Figure 2 shows the graphical model for SRM — where a probabilistic optimization approach is used to learn a shared space \mathbf{S} and the basis of topographies \mathbf{W}_{i}. In this figure, \mathbf{s}_{t} \in \mathbb{R}^{k} with covariance \mathbf{\Sigma}_{s} is a hyperparameter modeling the shared response at time t=1:d, \mathbf{x}_{it}\in \mathbb{R}^{v} denotes the observed pattern of voxel responses for the i-th subject at time t, \mathbf{\rho}^{2}_{i} is i-th subject independent hyperparameter, and \mathbf{\mu}_{i} denotes the subject specific mean. The final optimization procedure is explained in Section 3.1 of [7].

## Supervised Hyperalignment (SHA)

We recently illustrated that the performance of HA methods might not be optimum for supervised fMRI analysis (i.e., MVP problems) because they mostly employed unsupervised GCCA techniques for aligning the neural activities across subjects [2, 3]. Therefore, we have developed Local Discriminant Hyperalignment (LDHA) [3] and then Supervised Hyperalignment (SHA) [2] for improving the alignment accuracy in the MVP problems. Although LDHA can improve the performances of both functional alignment and MVP analysis, its objective function cannot directly calculate a supervised shared space and still uses the classical unsupervised shared space [3]. Thus, it cannot provide stable performance and acceptable runtime for large datasets in real-world applications [2].

Figure 3 compares the main difference between unsupervised HA, LDHA, and SHA. As depicted in this figure, two subjects watch two photos of houses, as well as two photos of bottles — where [\mathbf{H1}, \mathbf{B1}, \mathbf{H2}, \mathbf{B2}], shows the sequence of stimuli (after temporal alignment). Here, the shared spaces can be calculated by employing different correlations between neural activities. Figure 3.a demonstrates that the unsupervised HA just maximizes the correlation between the voxels with the same position in the time series because it does not use the supervision information.

Figure 3.b illustrates the LDHA approach, where it utilizes the unsupervised shared space for the alignment problem. Indeed, the main issue in LDHA objective function is that the covariance matrices cannot decompose to the product of a symmetric matrix [2]. In order to calculate the shared space in LDHA, each pair of stimuli must be separately compared with each other, and the shared space is gradually updated in each comparison (see Algorithm 2 in [3]). Therefore, LDHA cannot use a generalized optimization approach (such as GCCA) and its time complexity is not efficient for large datasets.

As shown in Figure 3.c, SHA consists of two main steps:

1. Generating a supervised shared space, where each stimulus is only compared with the shared space to align the neural activities;
2. Calculating the mapped features in a single iteration.

The neural activities belong to \ell\text{-}th subject can be denoted by \mathbf{X}^{(\ell)} \in \mathbb{R}^{T\times V}\text{, } \ell=1\text{:}S (defined same as the Hyperalignment section) and the class labels that are denoted by \mathbf{Y}^{(\ell)}=\{{y}_{mn}^{(\ell)} \}\text{, }\mathbf{Y}^{(\ell)}\in\{0, 1\}^{L\times T}\text{, } m=1\text{:}L\text{, }n=1\text{:}T\text{, }L>1. Here, L is the number of classes (stimulus categories). In order to infuse supervision information to the HA problem, this medod defines a supervised term as follows:

\mathbf{K}^{(\ell)} \in \mathbb{R}^{L \times T} = \mathbf{Y}^{(\ell)}\mathbf{H},

where the normalization matrix \mathbf{H}\in\mathbb{R}^{T\times T} is denoted as follows:

\mathbf{H} = \mathbf{I}_{T} - \gamma{\mathbf{1}}_{T},

where {\mathbf{1}}_{T} \in {1}^{T\times T} denotes ones matrix in size T, and \gamma makes a trade-off between within-class and between-class samples. Objective function of SHA is defined as follows:

\underset{\mathbf{W}, \mathbf{R}^{(i)}}{\min}\{\sum_{i = 1}^{S} \| \mathbf{K}^{(i)}\mathbf{X}^{(i)}\mathbf{R}^{(i)} - \mathbf{W}\|^2_F\}, (\mathbf{R}^{(\ell)})^\top((\mathbf{K}^{(\ell)}\mathbf{X}^{(\ell)})^\top\mathbf{K}^{(\ell)}\mathbf{X}^{(\ell)} + \epsilon \mathbf{I}_V) \mathbf{R}^{(\ell)}=\mathbf{I}_{V},

where \ell=1\text{:}S, \epsilon is a regularization term, \mathbf{R}^{(\ell)} denotes the mapping matrices, and \mathbf{W} \in \mathbb{R}^{L\times V} is supervised shared space — such that:

\mathbf{W} = \frac{1}{S} \sum_{j=1}^{S} \mathbf{K}^{(j)}\mathbf{X}^{(j)}\mathbf{R}^{(j)}.

We then show that supervised shared space can be calculated directly as follows: [2]

\underset{\mathbf{W}, \mathbf{R}^{(i)}}{\min}\{\sum_{i = 1}^{S} \|\mathbf{K}^{(i)}\mathbf{X}^{(i)}\mathbf{R}^{(i)} - \mathbf{W} \|^2_F\} \equiv \underset{\mathbf{W}}{\min}\{\text{tr}(\mathbf{W}^\top\mathbf{U}\mathbf{W})\}, \text{subject to }\mathbf{R}^{(\ell)}=((\mathbf{K}^{(\ell)}\mathbf{X}^{(\ell)})^\top\mathbf{K}^{(\ell)}\mathbf{X}^{(\ell)} + \epsilon\mathbf{I}_V\Big)^{-1}\big(\mathbf{K}^{(\ell)}\mathbf{X}^{(\ell)})^\top\mathbf{W}, \mathbf{U} = \sum_{\ell=1}^{S}\mathbf{I}_{L} - \mathbf{K}^{(\ell)}\mathbf{X}^{(\ell)}((\mathbf{K}^{(\ell)}\mathbf{X}^{(\ell)})^\top\mathbf{K}^{(\ell)}\mathbf{X}^{(\ell)}+\epsilon\mathbf{I_{V}})^{-1}\mathbf{K}^{(\ell)}\mathbf{X}^{(\ell)})^\top.

Here, \mathbf{W} is the right eigenvectors of \mathbf{U} [2, 4]. Further, the unsupervised shared space for the testing stage is calculated as follows: [2]

\mathbf{G} = \frac{1}{S}\Big(\sum_{\ell=1}^{S}\mathbf{W}^T\mathbf{K}^{(\ell)}\Big)^\top.

Indeed, the testing phase for SHA is the same as unsupervised HA. The only difference between SHA and unsupervised HAs lies in the procedure of generating the shared space in the training phase.

## Conclusion

One of the main challenges in fMRI studies, especially Multivariate Pattern (MVP) analysis, is using multi-subject datasets. On the one hand, the multi-subject analysis is necessary to estimate the validity of the generated results across subjects. On the other hand, analyzing multi-subject fMRI data requires accurate functional alignment between neuronal activities of different subjects for improving the performance of the final results. Hyperalignment (HA) is one of the most significant functional alignment methods, which can be formulated as a CCA problem for aligning neural activities of different subjects to a common/shared space. HA techniques can use different optimization solutions for generating an adequate shared space — classic CCA (in HA), probabilistic CCA (in SRM), and supervised methods (in LDHA and SHA). In the future, we will describe the related math background for alignment techniques and explain some challenging issues that may happen during the analysis.

#### References

1. Decoding Neural Representational Spaces Using Multivariate Pattern Analysis. DOI: https://doi.org/10.1146/annurev-neuro-062012-170325
2. Supervised Hyperalignment for multi-subject fMRI data alignment. DOI: 10.1109/TCDS.2020.2965981