Examples
========

This is a first example of a real application of the differentiable programming paradigm in the context of mapping the brain's functional connectivity. This example replicates the third experiment from our first preprint on this subject ("Covariance modelling"). The goal of this experiment is to create a simple, differentiable map of the dynamics of community structure in the brain's functional connectome.

The connectome is a graph of the brain's functional connectivity, where each node represents a brain region and each edge represents the strength of the connection between two regions. The community structure of the connectome is a partition of the nodes into groups that are more strongly connected to each other than to nodes in other groups. The dynamics of community structure is the evolution of the community structure over time. Here, we use a simple operationalisation of the dynamics that tracks only whether a community is present or absent at each time point.

This tutorial steps through the code for this experiment.

Loading the dataset
-------------------

In this tutorial, we'll perform the experiment using a subset of the Midnight Scan Club (MSC) dataset. We select a subset of the dataset to expedite the processing steps, and because we can find a rich community structure even using only this subset. The MSC dataset is a collection of fMRI scans of 10 subjects, each performing a number of in-scanner tasks across 10 scanning sessions. Here, we use the first 3 resting-state scans from each subject.

The MSC dataset is available from the OpenNeuro website. Below, we're using a utility function to retrieve a version of the dataset that has already been preprocessed. The preprocessing includes, among other standard steps, dimensionality reduction using a 400-region parcellation (brain atlas) and denoising using a 36-parameter model of motion estimates and nuisance signals.

.. code-block:: python

    import pathlib
    import jax.numpy as jnp
    import pandas as pd

    from hypercoil.engine.axisutil import extend_to_max_size
    from hypercoil.neuro.data_msc import minimal_msc_download

    dataset_root = f"{minimal_msc_download()}/data/ts/"
    paths = pathlib.Path(dataset_root).glob("*ses-func0[1-3]*task-rest*ts.1D")
    time_series = tuple(pd.read_csv(path, sep=" ", header=None) for path in paths)
    time_series = tuple(jnp.array(t.values.T) for t in time_series)
    time_series = jnp.stack(extend_to_max_size(time_series, 0.0))
    assert time_series.shape == (30, 400, 814)


Defining the model
------------------

Next, let's define the model that will be learning the community structure. We need to define the model's parameters, the model's forward pass, and the model's loss function. We begin here with the parameters. The model has two parameter tensors: one for the community structure and one for the dynamics of community structure. The community structure is a matrix of shape (n_nodes, n_communities), where each row represents a node and each column represents a community. The dynamics of community structure is a binary matrix of shape (n_time_points, n_communities), where each row represents a time point and each column represents a community.

We want to learn a community structure that is common to all subjects and all scans in the dataset. However, we also want the dynamics to be specific to each scan. To achieve this, we define the community structure as a parameter tensor that is shared across all scans, and we define the dynamics as a parameter tensor that is specific to each scan. We can achieve this by defining the community structure as a parameter tensor of shape (n_nodes, n_communities), and the dynamics as a parameter tensor of shape (n_scans, n_time_points, n_communities). The first dimension of the dynamics tensor will be used to index the dynamics of each scan.

We also want to impose some constraints on the values that the parameter tensors can take. For the community structure tensor, we's ideally want each node to belong to exactly one community. But this would give us a combinatorial optimisation problem instead of a differentiable one, so we'll relax this constraint to instead allow each node's community assignment to be a categorical probability distribution over communities. We can achieve this by projecting the community structure tensor onto the probability simplex, which is the set of vectors whose elements are nonnegative and sum to 1.0. Behind the scenes, ``hypercoil``'s ``Probability SimplexParameter`` implements this constraint using a softmax mapping.

For the dynamics tensor, we'd ideally want to impose the constraint that each community is either present or absent at each time point -- but again, we'll relax this constraint for the sake of differentiability. We'll instead allow the presence of each community at each time point to vary continuously in (0, 1) using a ``MappedLogits`` parameter. Later, we'll introduce some regularisations that encourage the dynamics to be close to binary.

The last model parameter is a scalar that sets the resolution of the community detection algorithm. This scalar, gamma, promotes discovery of more, smaller communities as it is increased. We won't learn this parameter, but we'll set it to a reasonable value for the purposes of this tutorial. In practice, we've found that a "default" value of 1 for gamma results in an unbalanced community structure that is dominated by a few large communities. We've found that a value of 5 for gamma results in a more balanced community structure.

With that said, let's implement the model:

.. code-block:: python

    import jax
    import equinox as eqx
    from hypercoil.engine import Tensor, PyTree
    from hypercoil.init.mapparam import (
        MappedLogits,
        ProbabilitySimplexParameter,
    )


    class DynamicCommunityModel(eqx.Module):
        n_nodes: int
        n_communities: int
        n_scans: int
        n_time_points: int
        gamma: float
        affiliation: Tensor
        dynamics: Tensor

        def __init__(
            self,
            n_nodes: int,
            n_scans: int,
            n_communities: int,
            n_time_points: int,
            gamma: float = 1.0,
            init_scale_affiliation: float = 0.01,
            init_scale_dynamics: float = 0.001,
            *,
            key: 'jax.random.PRNGKey',
        ):
            super().__init__()
            self.n_nodes = n_nodes
            self.n_communities = n_communities
            self.n_scans = n_scans
            self.n_time_points = n_time_points
            self.gamma = gamma

            self.affiliation = init_scale_affiliation * jax.random.normal(
                key, shape=(n_nodes, n_communities)) + 1.0
            self.dynamics = init_scale_dynamics * jax.random.normal(
                key, shape=(n_scans, n_communities, n_time_points)) + 0.5

        def __call__(self, time_series: Tensor) -> Tensor:
            return model_forward(
                time_series,
                self.affiliation,
                self.dynamics,
                self.gamma,
            )


    def parameterise_model(model):
        model = ProbabilitySimplexParameter.map(
            model, where="affiliation", axis=-1)
        model = MappedLogits.map(
            model, where="dynamics")
        return model


Defining the loss function
--------------------------

Next, 

.. code-block:: python

    from hypercoil.loss import (
        LossScheme,
        LossApply,
        Loss,
        LossArgument,
        UnpackingLossArgument,
        ModularityLoss,
        SmoothnessLoss,
        BimodalSymmetricLoss,
        identity,
        sum_scalarise,
        mean_scalarise,
        vnorm_scalarise,
    )

    def dynamic_community_loss(
        modularity_nu: float,
        smoothness_nu: float,
        dynamic_community_nu: float,
        bimodal_symmetric_nu: float,
        gamma: float,
    ) -> LossScheme:

        loss = LossScheme([
            LossApply(
                ModularityLoss(nu=modularity_nu, name='Modularity', gamma=gamma),
                apply=lambda arg: UnpackingLossArgument(
                    A=arg.corr_unparam,
                    Q=arg.affiliation,
                )),
            LossApply(
                Loss(
                    nu=dynamic_community_nu,
                    name='DynamicCommunities',
                    score=identity,
                    scalarisation=mean_scalarise(
                        axis=None,
                        inner=sum_scalarise(axis=(-1, -2), keepdims=True)
                    ),
                ),
                apply=lambda arg: -(arg.coaffiliation * arg.modularity)
            ),
            LossScheme([
                SmoothnessLoss(
                    nu=smoothness_nu,
                    scalarisation=mean_scalarise(
                        inner=vnorm_scalarise(axis=-1))
                ),
                BimodalSymmetricLoss(nu=bimodal_symmetric_nu, modes=(0, 1))
            ], apply=lambda arg: arg.dynamics)
        ])

        return loss


Defining the forward pass
-------------------------

.. code-block:: python

    from hypercoil.engine import _to_jax_array
    from hypercoil.functional import corr, modularity_matrix, coaffiliation

    def model_forward(
        time_series: Tensor,
        affiliation: Tensor,
        dynamics: Tensor,
        gamma: float,
    ) -> Tensor:
        # Ensure that all data tensors and parameters are JAX arrays.
        time_series = _to_jax_array(time_series)
        affiliation = _to_jax_array(affiliation)
        dynamics = _to_jax_array(dynamics)

        # Compute the correlation matrix for each scan.
        corr_unparam = corr(time_series)
        corr_param = corr(time_series[:, None, ...], weight=dynamics)

        # Compute the modularity matrix for each scan.
        B = modularity_matrix(
            corr_param,
            normalise_modularity=True,
            gamma=gamma,
        )
        # Compute the community co-affiliation matrix.
        H = coaffiliation(
            affiliation.T[..., None],
            normalise_coaffiliation=True,
        )

        # Build arguments for the loss function.
        args = LossArgument(
            corr_unparam=corr_unparam,
            corr_param=corr_param,
            affiliation=affiliation,
            dynamics=dynamics,
            modularity=B,
            coaffiliation=H,
        )

        return args


Defining the optimisation loop
------------------------------

.. code-block:: python

    from typing import Callable, Tuple
    import optax


    def init_optimiser(lr: float, model: PyTree) -> optax.GradientTransformation:
        optim = optax.adam(lr)
        optim_state = optim.init(eqx.filter(model, eqx.is_inexact_array))
        return optim, optim_state


    def update(
        model: PyTree,
        input: Tensor,
        loss_scheme: Callable,
        optim: optax.GradientTransformation,
        optim_state: PyTree,
        *,
        key: 'jax.random.PRNGKey',
    ) -> Tuple[PyTree, optax.OptState]:
        def loss_fn(model, input, key):
            args = model_forward(
                input, model.affiliation, model.dynamics, model.gamma
            )
            return loss_scheme(args, key=key)

        (loss, meta), grads = eqx.filter_value_and_grad(
            loss_fn, has_aux=True)(model, input, key=key)
        updates, optim_state = optim.update(
            eqx.filter(grads, eqx.is_inexact_array),
            optim_state,
            eqx.filter(model, eqx.is_inexact_array),
        )
        model = eqx.apply_updates(model, updates)
        return model, optim_state, loss, meta


Train the model
---------------

.. code-block:: python

    # Configure the hyperparameters.
    n_communities = 10
    n_time_points = 814
    n_nodes = 400
    n_scans = 30
    lr = 0.05
    modularity_nu = 10
    dynamic_community_nu = 2e-3
    smoothness_nu = .2
    bimodal_symmetric_nu = 2
    max_epoch = 500
    gamma = 5
    key = jax.random.PRNGKey(0)

    key_model, key_train = jax.random.split(key)

    # Initialise the model.
    model = DynamicCommunityModel(
        n_nodes=n_nodes,
        n_scans=n_scans,
        n_communities=n_communities,
        n_time_points=n_time_points,
        gamma=gamma,
        key=key_model,
    )
    model = parameterise_model(model)

    # Initialise the loss function.
    loss_scheme = dynamic_community_loss(
        modularity_nu=modularity_nu,
        smoothness_nu=smoothness_nu,
        dynamic_community_nu=dynamic_community_nu,
        bimodal_symmetric_nu=bimodal_symmetric_nu,
        gamma=gamma,
    )

    # Initialise the optimiser.
    optim, optim_state = init_optimiser(lr, model)

    # Train the model.
    for epoch in range(max_epoch):
        key_epoch = jax.random.fold_in(key_train, epoch)
        model, optim_state, loss, meta = eqx.filter_jit(update)(
            model,
            time_series,
            loss_scheme,
            optim,
            optim_state,
            key=key_epoch,
        )

        if epoch % 10 == 0:
            print(f'Epoch: {epoch}, Loss: {loss}')
            for k, v in meta.items():
                print(f'{k}: {v.value:.4f}')