Linear Mixed-effects Models#

Welcome for the first practical session of the day !

Objectives :#

  • Get a better idea of medical data, especially longitudinal ones

  • Understand mixed-effects models

  • Get a taste of state-of-the-art techniques

The set-up#

If you have followed the installation details carefully, you should

  • be running this notebook in the leaspy_tutorial conda environment (be sure that the kernel you are using is leaspy_tutorial => check Kernel above)

  • having all the needed packages already install

πŸ’¬ Question 1 πŸ’¬ Run the following command lines

import os
import sys

import pandas as pd
import numpy as np
from scipy import stats

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Part I: The data#

We import a functional medical imaging dataset. We have extracted, for each timepoint, the average value of the metabolic activity of the putamen. This brain region is commonly damaged by Parkinson’s disease.

IMPORTANT: The values have been normalized such that a normal value is zero and a very abrnomal value is one.

πŸ’¬ Question 2 πŸ’¬ Run the following cell and look at the head of the dataframe to better understand what the data are.

from leaspy.datasets import Loader
df = Loader.load_dataset('parkinson-putamen-train_and_test')

# To complete
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

df.head()
PUTAMEN
ID TIME SPLIT
GS-001 71.354607 train 0.728492
71.554604 train 0.735620
72.054604 train 0.757409
73.054604 train 0.800754
73.554604 train 0.870756

ℹ️ Information ℹ️ The SPLIT column already distinguishes the train and test data.

πŸ’¬ Question 3 πŸ’¬ Describe the target variable PUTAMEN and the explicative variable TIME. You can plot:

  • Sample size

  • Mean, std

  • Min & max values

  • Quantiles

# To complete
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

df.reset_index().describe().round(2).T
count mean std min 25% 50% 75% max
TIME 1997.0 65.37 10.05 33.14 58.66 66.48 72.06 91.24
PUTAMEN 1997.0 0.71 0.10 0.35 0.64 0.71 0.77 0.96

πŸ’¬ Question 4 πŸ’¬ From this value, what can you say about the disease stage of the population?

Your answer: …

Answer: The median and mean value is 0.71, so the average disease stage is high for these subjects.

πŸ’¬ Question 5 πŸ’¬ Display the data, where the Putamen (y-axis) is plot with respect to the Time (x-axis)

# To complete
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

sns.set_style('whitegrid')
plt.figure(figsize=(14,6))
sns.scatterplot(data=df.xs('train', level='SPLIT').reset_index(), 
                x='TIME', y='PUTAMEN', alpha=.5, s=60)
plt.title('PUTAMEN - Raw data')
plt.show()
../../../_images/03175409af01347e57315854a40a30e8d0e31c976ab442d15678b6000d93331a.png

⚑ Remark ⚑ At first look, the PUTAMEN values do not seem highly correlated to TIME.

Part II: Linear Regression#

As we are some pro ML players, let’s make some predictions : let’s try to predict the putamen value based on the time alone.

πŸ’¬ Question 6 πŸ’¬ Store the train and test data in df_train and df_test

# To complete
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

pds = pd.IndexSlice
df_train = df.loc[pds[:, :, 'train']].copy()  # one possibility
df_test = df.xs('test', level='SPLIT').copy()  # an other one

πŸ’¬ Question 7 πŸ’¬ Run the linear reagression that is in scipy.

Be carefull, you have to train it only with the train set!

x = # Complete with the appropriate data
y = # Complete with the appropriate data

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

x = df_train.index.get_level_values('TIME').values
y = df_train['PUTAMEN'].values

slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

ℹ️ Information ℹ️ To run the notebook smoothly, you must comply to the following rules: we are going to try different models to predict the putamen values based on each observation.

We will store the results in the dataframe df such that :

ID

TIME

SPLIT

PUTAMEN

Model 1

Model 2

…

GS-001

74.4

train

0.78

0.93

0.75

…

GS-003

75.4

train

0.44

0.84

0.46

…

GS-018

51.8

test

0.71

0.73

0.78

…

GS-056

89.2

train

0.76

0.56

0.61

…

This will ease the comparison of the models.

⚑ Remark ⚑ No need to add these predictions to df_train and df_test. You should be able to easily run the notebook by keeping df_train the way it is while appending the results in df.

πŸ’¬ Question 8 πŸ’¬ Add the predictions done by the linear regression in the column Linear Regression

df['Linear Regression'] = # Your code here
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

df['Linear Regression'] = intercept + slope * df.index.get_level_values('TIME')

ℹ️ Information ℹ️ Let’s introduce an object and a fonction that will be used to compare the models:

  • overall_results will be the dataframe that stores the root mean square error on the train and test set for the different models

  • compute_rmse_train_test is the function that given the dataframe df and a model_name (Linear Regression for instance), compute the mean absolute error on the train and test set and stores it in overall_results

πŸ’¬ Question 9 πŸ’¬ Run the following cell to see the results

overall_results = pd.DataFrame({'train': [], 'test': []})  


def compute_rmse(df, model_name):
    """Compute RMSE between PUTAMEN column and the <model_name> column of df"""
    y = df['PUTAMEN']
    y_hat = df[model_name]
    diff = y - y_hat
    return np.sqrt(np.mean(diff * diff))


def compute_rmse_train_test(df, overall_results, model_name):
    """Inplace modification of <overall_results>"""
    overall_results.loc[model_name, 'train'] = compute_rmse(df.xs('train', level='SPLIT'), model_name)
    overall_results.loc[model_name, 'test'] = compute_rmse(df.xs('test', level='SPLIT'), model_name)

compute_rmse_train_test(df, overall_results, 'Linear Regression')
overall_results
train test
Linear Regression 0.091403 0.10213

⚑ Remark ⚑ The RMSE is higher on the test set as on the train set

Let’s look at what we are doing by plotting the data and the linear regression. Throughout the notebook, we will use the function plot_individuals that, given a subset of IDs and a model name (as stored in the df dataframe) plots the individual data and their prediction

πŸ’¬ Question 10 πŸ’¬ Use the following cell.

def get_title(overall_results, model_name):
    """Precise model's name and its RMSE train & test"""
    rmse_train = overall_results.loc[model_name, 'train']
    rmse_test = overall_results.loc[model_name, 'test']
    title = f'PUTAMEN - Raw data vs {model_name:s}\n'
    title += r'$RMSE_{train}$ = %.3f     $RMSE_{test}$ = %.3f' % (rmse_train, rmse_test)
    return title

    
def plot_individuals(df, overall_results, model_name, **kwargs):
    # ---- Input manager
    kind = kwargs.get('kind', 'lines')
    sublist = kwargs.get('sublist', None)
    highlight_test = kwargs.get('highlight_test', True)
    ax = kwargs.get('ax', None)
    
    if ax is None:
        fig, ax = plt.subplots(figsize=(14, 6))
    else:
        plt.figure(figsize=(14, 6))
    
    # ---- Select subjects
    if sublist is None:
        sublist = df.index.unique('ID')
        
    # ---- If too many subject, do not display them in the legend
    display_id_legend = len(sublist) <= 10
    
    # ---- Plot
    if kind == 'scatter':
        # -- Used for Linear Regression
        sns.scatterplot(data=df.reset_index(), x='TIME', y='PUTAMEN', alpha=.5, 
                        s=60, label='Observations', ax=ax)
        ax.plot(df.index.get_level_values('TIME').values,
                df[model_name].values, 
                label=model_name, c='C1')
        ax.legend(title='LABEL')
        if highlight_test:
            test = df.xs('test', level='SPLIT').loc[sublist].reset_index()
            sns.scatterplot(data=test, x='TIME', y='PUTAMEN', legend=None, ax=ax)
        
    elif kind == 'lines':
        # -- Used for the other models
        # - Stack observations & reconstructions by the model
        df_stacked = df[['PUTAMEN', model_name]].copy()
        df_stacked.rename(columns={'PUTAMEN': 'Observations'}, inplace=True)
        df_stacked = df_stacked.stack().reset_index().set_index(['ID', 'SPLIT'])
        df_stacked.columns = ['TIME', 'LABEL', 'PUTAMEN']
        
        # - Plot
        sns.lineplot(data=df_stacked.loc[sublist], 
                     x='TIME', y='PUTAMEN', hue='ID', style='LABEL', 
                     legend=display_id_legend, ax=ax)
        if highlight_test:
            test = df.xs('test', level='SPLIT').loc[sublist].reset_index()
            sns.scatterplot(data=test, x='TIME', y='PUTAMEN', hue='ID', legend=None, ax=ax)
        if display_id_legend:
            ax.legend(title='LABEL', bbox_to_anchor=(1.05, 1), loc='upper left')
    else:
        raise ValueError('<kind> input accept only "scatter" and "lines".'
                         f' You gave {kind}')
        
    ax.set_title(get_title(overall_results, model_name))
    return ax
    
plot_individuals(df, overall_results, 'Linear Regression', 
                 kind='scatter', highlight_test=False)
plt.show()
../../../_images/66f37a9115fc729ba9fe2815733b944158131467afdc9b323c4e6c81231ac094.png

Is the previous plot relevant to assess the quality of our model?

We will answer this question in the following cells:

Part III: The longitudinal aspect#

πŸ’¬ Question 11 πŸ’¬ Run the cell to have a better understanding of your data:

plot_individuals(df, overall_results, 'Linear Regression', 
                 kind='lines', highlight_test=True)
plt.title('Longitudinal aspect')
plt.show()
../../../_images/f1685ff59390af1173e243eb59add389f66280f8659d291b8cfeed8b0ce1b3eb.png

The test data are highlited with dots. πŸ’¬ Question 12 πŸ’¬ What are actually the test data ?

Your answer: …

πŸ’¬ Question 13 πŸ’¬Why does the global linear model not describe the temporal evolution of the variable?

Your answer: …

PART IV: Indivual Linear Regressions#

ℹ️ Information ℹ️ In fact, this is not the best idea to have one general linear regression. Because we do not benefit from indiviudal information. Therefore, let’s do one linear regression per individual

πŸ’¬ Question 14 πŸ’¬ Look at what this function is doing and at the result

individual_parameters = pd.DataFrame({'INTERCEPT': [], 'SLOPE': []})

subject_idx = 'GS-194'

def compute_individual_parameters(df, subject_idx):
    df_patient = df.loc[subject_idx]
    
    x = df_patient.index.get_level_values('TIME').values
    y = df_patient['PUTAMEN'].values
    
    # -- Linear regression
    slope, intercept, _, _, _ = stats.linregress(x, y)
    
    
    return intercept, slope


individual_parameters.loc[subject_idx] = compute_individual_parameters(df_train, subject_idx)

individual_parameters
INTERCEPT SLOPE
GS-194 0.025188 -0.807941

πŸ’¬ Question 15 πŸ’¬ Apply the function to everyone

# Your answer
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #


for subject_idx in df_train.index.unique('ID'):
   
    slope, intercept = compute_individual_parameters(df_train, subject_idx)
    individual_parameters.loc[subject_idx] = (intercept, slope)
    
individual_parameters.head()
INTERCEPT SLOPE
GS-194 -0.807941 0.025188
GS-001 -2.946305 0.051464
GS-002 -0.252949 0.018643
GS-003 0.517972 0.003816
GS-004 -1.076280 0.025680

πŸ’¬ Question 16 πŸ’¬ Now append the result of the model in df using the function below.

⚑ Remark ⚑ Take a close look at what we are doing cause we will use the same syntax if other questions.

def compute_individual_reconstruction(x, parameters):
    
    subject_idx = x.name[0]
    
    slope = parameters.loc[subject_idx]['SLOPE']
    intercept = parameters.loc[subject_idx]['INTERCEPT']
    
    time = x.name[1]
    
    return intercept + slope * time
    
    

df['Individual Linear Regression'] = df.apply(
    lambda x: compute_individual_reconstruction(x, individual_parameters), axis=1)

πŸ’¬ Question 17 πŸ’¬ Use the compute_train_test_mean_absolute_error function to get the train and test errors and compare the two models.

# Your code here
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

compute_rmse_train_test(df, overall_results, 'Individual Linear Regression')
overall_results
train test
Linear Regression 0.091403 0.102130
Individual Linear Regression 0.017825 0.032197

We clearly see that the RMSE is much better! πŸ’¬ Question 18 πŸ’¬ Create a list with the five patients having the more visits and the five patients having the less visits, then use plot_individuals function to display their observations and reconstructions by the model Hint : use the keyword sublist

# sublist = # TODO
# plot_individuals(df, overall_results, model_name, sublist=sublist)
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

visits_per_subjects = df.groupby(df.index.get_level_values('ID')).count().sort_values('PUTAMEN')
sublist = visits_per_subjects.tail(5).index.tolist()
sublist += visits_per_subjects.head(5).index.tolist()

plot_individuals(df, overall_results, 'Individual Linear Regression', sublist=sublist)
plt.show()
../../../_images/3aa9626124d05f8d540c44e136b5bc0b7fd6b0f0054d679e6e08f934b207933c.png

πŸ’¬ Question 19 πŸ’¬ Explain why \(RMSE_{test} >> RMSE_{train}\):

Your answer: …

Answer: the LM overfit for patients with only few data

Part V : Linear Mixed effects Model with statsmodels#

With the previous method, we made a significant improvement. However, we suffer fro an overfitting problem. Let’s see what a mixed effect model can do for us!

Run a LMM with statsmodels#

ℹ️ Information ℹ️ We will use the statsmodel package to run a Linear Mixed Effect Model (LMM or LMEM in the literature).

πŸ’¬ Question 20 πŸ’¬ Load the following lines to import the packages

import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.regression.mixed_linear_model import MixedLMParams

Statsmodels contains several API to create a model. For the ones familiar with R, you will be here in a familiar ground with the formula API.

  • formula='PUTAMEN ~ TIME + 1' means that you want to explain PUTAMEN with TIME and an intercept

  • groups="ID" means that you want random effect for all levels of ID

  • re_formula="~TIME + 1" means that you want a random intercept and a random slope for TIME

If you go back to the equation you get : \( PUTAMEN_{id,time} = \underbrace{\alpha*TIME_{id,time} + \beta}_\text{formula} + \underbrace{\alpha_{id}*TIME_{id,time} + \beta_{id}}_\text{re_formula}\)

πŸ’¬ Question 21 πŸ’¬ Let’s try a very naive run:

lmm = smf.mixedlm(formula='PUTAMEN ~ 1 + TIME', 
                  data=df_train.reset_index(), 
                  groups="ID", re_formula="~ 1 + TIME").fit()
lmm.summary()
/home/juliette.ortholand/miniconda3/envs/leaspype/lib/python3.8/site-packages/statsmodels/base/model.py:566: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "
/home/juliette.ortholand/miniconda3/envs/leaspype/lib/python3.8/site-packages/statsmodels/regression/mixed_linear_model.py:2200: ConvergenceWarning: Retrying MixedLM optimization with lbfgs
  warnings.warn(
/home/juliette.ortholand/miniconda3/envs/leaspype/lib/python3.8/site-packages/statsmodels/regression/mixed_linear_model.py:2237: ConvergenceWarning: The MLE may be on the boundary of the parameter space.
  warnings.warn(msg, ConvergenceWarning)
Model: MixedLM Dependent Variable: PUTAMEN
No. Observations: 1415 Method: REML
No. Groups: 200 Scale: 0.0007
Min. group size: 2 Log-Likelihood: 2497.0458
Max. group size: 13 Converged: Yes
Mean group size: 7.1
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept -0.685 0.044 -15.644 0.000 -0.771 -0.599
TIME 0.022 0.001 29.528 0.000 0.020 0.023
ID Var 0.003 0.259
ID x TIME Cov 0.000 0.005
TIME Var 0.000 0.000

⚑ Remark ⚑ Let’s skip the different warning for now and see what happens if we ignore it

Let’s try and see.

πŸ’¬ Question 22 πŸ’¬ Run the following commands to get the intercept and slope

print(lmm.fe_params.loc['Intercept'])
print(lmm.fe_params.loc['TIME'])
-0.6848661069361888
0.021679337332944623

πŸ’¬ Question 23 πŸ’¬ Run the following commands to get the variation to the mean slope and intercept Example on few subject

{key: val for key, val in lmm.random_effects.items() if key in 
 ['GS-00'+str(i) for i in range(1, 4)]}
{'GS-001': ID     -0.019149
 TIME   -0.001092
 dtype: float64,
 'GS-002': ID      0.058495
 TIME    0.004182
 dtype: float64,
 'GS-003': ID     -0.011974
 TIME   -0.001002
 dtype: float64}

πŸ’¬ Question 24 πŸ’¬ From the fixed and random effects, compute for each subject its INTERCEPT and SLOPE:

# df_random_effects['INTERCEPT'] = # TODO
# df_random_effects['SLOPE'] = # TODO

# df_random_effects.head()
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

df_random_effects = pd.DataFrame.from_dict(lmm.random_effects, orient='index')
df_random_effects = df_random_effects.rename({'ID': 'Random intercept', 'TIME': 'Random slope'}, axis=1)

df_random_effects['INTERCEPT'] = df_random_effects['Random intercept'] + lmm.fe_params.loc['Intercept']
df_random_effects['SLOPE'] = df_random_effects['Random slope'] + lmm.fe_params.loc['TIME']

df_random_effects.head()
Random intercept Random slope INTERCEPT SLOPE
GS-001 -0.019149 -0.001092 -0.704015 0.020588
GS-002 0.058495 0.004182 -0.626371 0.025861
GS-003 -0.011974 -0.001002 -0.696840 0.020677
GS-004 -0.020968 -0.001529 -0.705834 0.020150
GS-005 0.006285 0.000363 -0.678581 0.022043

πŸ’¬ Question 25 πŸ’¬ Use the compute_individual_reconstruction function but with df_random_effects to compute the prediction with the new individual effects

df['Linear Mixed Effect Model'] = # Your code here
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

df['Linear Mixed Effect Model'] = df.apply(
    lambda x: compute_individual_reconstruction(x, df_random_effects), axis=1)

πŸ’¬ Question 26 πŸ’¬ Store the results in overall_results (thanks to compute_rmse_train_test function) and compare the models

# Your code here
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

compute_rmse_train_test(df, overall_results, 'Linear Mixed Effect Model')

overall_results
train test
Linear Regression 0.091403 0.102130
Individual Linear Regression 0.017825 0.032197
Linear Mixed Effect Model 0.024533 0.039367

⚑ Remark ⚑ The result is worse than with the previous model.

πŸ’¬ Question 27 πŸ’¬ What do you think happened? Let’s check it visually

# Your code here
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

plot_individuals(df, overall_results, 'Linear Mixed Effect Model', sublist=sublist)
plt.show()
../../../_images/e16e7ae252343fc2399853fb1c9a05806a9af70575e559c0bf3120cc5fd340c4.png

⚑ Remark ⚑ All the slopes are the same. This is related to the warning above : these are warnings are a way of alerting you that you may be in a non-standard situation. Most likely, one of your variance parameters is converging to zero. Which is the case if you have a look to time variance.

πŸ’¬ Question 28 πŸ’¬ Let’s rerun it by normalizing the time first. Add to df_train and df_test a renormalizing function. Be careful to normalize only with the known ages from train. Hint : Watchout to data leakage!

# Your code

df['TIME_NORMALIZED'] = # Your code

# ---- Split again train & test
df_train = df.xs('train',level='SPLIT')
df_test = df.xs('test',level='SPLIT')
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

# ---- We use only train data to compute mean & std
ages = df_train.index.get_level_values(1).values
age_mean = ages.mean()
age_std = ages.std()

df['TIME_NORMALIZED'] = (df.index.get_level_values('TIME') -  age_mean) / age_std

# ---- Split again train & test
df_train = df.xs('train',level='SPLIT')
df_test = df.xs('test',level='SPLIT')

πŸ’¬ Question 29 πŸ’¬ Rerun the previous mixed lm (some cells above) but with the TIME_NORMALIZED instead of TIME in the formula and re_formula.

# YOUR CODE
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #


lmm = smf.mixedlm(formula='PUTAMEN ~ TIME_NORMALIZED + 1', data=df_train.reset_index(), 
                  groups="ID", re_formula="~TIME_NORMALIZED + 1").fit()

lmm.summary()
Model: MixedLM Dependent Variable: PUTAMEN
No. Observations: 1415 Method: REML
No. Groups: 200 Scale: 0.0005
Min. group size: 2 Log-Likelihood: 2629.4084
Max. group size: 13 Converged: Yes
Mean group size: 7.1
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept 0.679 0.016 43.031 0.000 0.648 0.710
TIME_NORMALIZED 0.231 0.011 20.134 0.000 0.209 0.254
ID Var 0.045 0.288
ID x TIME_NORMALIZED Cov -0.006 0.111
TIME_NORMALIZED Var 0.017 0.126

Ahaaaaah! No warnings!

πŸ’¬ Question 30 πŸ’¬ Get the parameters as previously in df_random_effects_2 and store the INTERCEPT_NORMALIZED and SLOPE_NORMALIZED

# TODO 
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #


df_random_effects_2 = pd.DataFrame.from_dict(lmm.random_effects, orient='index')
df_random_effects_2 = df_random_effects_2.rename({'ID': 'Random intercept', 'TIME_NORMALIZED': 'Random slope'},
                                                 axis=1)


df_random_effects_2['INTERCEPT_NORMALIZED'] = df_random_effects_2['Random intercept'] + \
                                              lmm.fe_params.loc['Intercept']
df_random_effects_2['SLOPE_NORMALIZED'] = df_random_effects_2['Random slope'] + \
                                          lmm.fe_params.loc['TIME_NORMALIZED']

df_random_effects_2.head()
Random intercept Random slope INTERCEPT_NORMALIZED SLOPE_NORMALIZED
GS-001 -0.222614 0.185999 0.456428 0.417479
GS-002 0.233375 -0.075922 0.912417 0.155558
GS-003 0.009845 -0.091712 0.688887 0.139767
GS-004 -0.093862 0.017902 0.585179 0.249382
GS-005 0.055686 -0.050411 0.734728 0.181068

Here, we computed

\[ y = SLOPE_{normalized} * TIME_{normalized} + INTERCEPT_{normalized} \]

which corresponds to

\[ y = SLOPE_{normalized} * \frac{(TIME - \mu(ages))}{ \sigma_{ages}} + INTERCEPT_{normalized} \]

which is :

\[ y = SLOPE * TIME + INTERCEPT \]

where $\(SLOPE = \frac{SLOPE_{normalized}}{ std_{ages}}\)\( and \)\(INTERCEPT = INTERCEPT_{normalized} - \frac{SLOPE_{normalized} * mean_{ages}}{std_{ages}}\)$

πŸ’¬ Question 31 πŸ’¬ From INTERCEPT_NORMALIZED & SLOPE_NORMALIZED, compute for each subject its INTERCEPT and SLOPE:

parameters['GS-001'] 
>>> {'INTERCEPT': ..., 'SLOPE': ...}
# YOUR CODE
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

df_random_effects_2['SLOPE'] = df_random_effects_2['SLOPE_NORMALIZED'] / age_std
df_random_effects_2['INTERCEPT'] = df_random_effects_2['INTERCEPT_NORMALIZED'] -\
                                   (df_random_effects_2['SLOPE_NORMALIZED'] * age_mean)/age_std

πŸ’¬ Question 32 πŸ’¬ Use the compute_individual_reconstruction function but with df_random_effects_2 to compute the prediction with the new individual effects

# Your code
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

df['Linear Mixed Effect Model - V2'] = df.apply(
    lambda x: compute_individual_reconstruction(x, df_random_effects_2), axis=1)

πŸ’¬ Question 33 πŸ’¬ Store the results in overall_results (thanks to compute_rmse_train_test function) and compare the models

# Your code
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

compute_rmse_train_test(df, overall_results, 'Linear Mixed Effect Model - V2')

overall_results
train test
Linear Regression 0.091403 0.102130
Individual Linear Regression 0.017825 0.032197
Linear Mixed Effect Model 0.024533 0.039367
Linear Mixed Effect Model - V2 0.019081 0.028939

The RMSE is much better!

πŸ’¬ Question 34 πŸ’¬ Display the subjects of sublist :

# Your code
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

plot_individuals(df, overall_results, 'Linear Mixed Effect Model - V2', sublist=sublist)
plt.show()
../../../_images/fe7f48facc29b87b846e7f80ba4d5f8de528360240605d290560f8cf11b89b88.png

πŸ’¬ Question 35 πŸ’¬ What is the main advantage of a Linear Mixed-effects Model compared to multiple Linear Models ?

Your answer…

πŸ’¬ Question 36 πŸ’¬ Let’s check the average RMSE per subject depending of the number of timepoints per subjects:

models = ['Individual Linear Regression', 'Linear Mixed Effect Model - V2']


def plot_rmse_by_number_of_visit(models, df=df):
    rmse = df[models].copy()
    rmse.columns.names = ['MODEL']
    rmse = rmse.stack()
    rmse = (rmse - df['PUTAMEN']) ** 2

    nbr_of_visits = rmse.xs('train',level='SPLIT').groupby(['MODEL','ID']).size()

    rmse = rmse.xs('test',level='SPLIT')
    rmse = rmse.reset_index().rename(columns={0: 'RMSE'})
    rmse = rmse.reset_index()[['ID', 'MODEL', 'RMSE']]
    rmse = rmse.groupby(['MODEL','ID']).mean() ** .5
    rmse['Visits number in train'] = nbr_of_visits

    plt.figure(figsize=(12, 5))
    ax = sns.boxplot(data=rmse.reset_index(), y='RMSE', 
                     x='Visits number in train', hue='MODEL', 
                     showmeans=True, whis=[5,95], showfliers=False)
    ax = sns.stripplot(data=rmse.reset_index(), y='RMSE', 
                       x='Visits number in train', hue='MODEL', 
                       dodge=True, alpha=.5, linewidth=1, ax=ax)

    plt.grid(True, axis='both')
    plt.ylim(0,None)
    plt.legend()
    return ax


plot_rmse_by_number_of_visit(models, df)
plt.show()
../../../_images/619fbc0980957917580273dc3cd29ae5144a9056b5b45c2092f449f3a1175e19.png

PART VI: A taste of the future - Linear mixed-effect model with Leaspy#

In the next practical sessions you will learn to use the package developped by the Aramis team. Now, just to be able to compare performances you will run a few methods of leaspy in advance…

πŸ’¬ Question 37 πŸ’¬ Run the following cell to make the import of leaspy methods, format the data and fit a model :

# --- Import methods
from leaspy import Leaspy, Data, AlgorithmSettings

# --- Format the data
data = Data.from_dataframe(df_train[['PUTAMEN']])

# --- Fit a model
leaspy_univariate = Leaspy('univariate_linear')
settings_fit = AlgorithmSettings('mcmc_saem', progress_bar=True, seed=0)
leaspy_univariate.fit(data, settings_fit)
 ==> Setting seed to 0
|##################################################|   10000/10000 iterations
The standard deviation of the noise at the end of the calibration is:
0.0215

Calibration took: 26s

Well it’s a bit slow, but here is a joke to wait:

  • What did the triangle say to the circle? β€œYou’re pointless.”

…

Okay, it was short. Here is another one:

  • I had an argument with a 90Β° angle. It turns out it was right.

πŸ’¬ Question 38 πŸ’¬ Run the following two cells to make the predictions:

settings_personalize = AlgorithmSettings('scipy_minimize', progress_bar=True, use_jacobian=True)

individual_parameters = leaspy_univariate.personalize(data, settings_personalize)
|##################################################|   200/200 subjects
The standard deviation of the noise at the end of the personalization is:
0.0184

Personalization scipy_minimize took: 20s
timepoints = {idx: df.loc[idx].index.get_level_values('TIME').values for idx in 
              df.index.get_level_values('ID').unique()}
estimates = leaspy_univariate.estimate(timepoints, individual_parameters)

πŸ’¬ Question 39 πŸ’¬ Add the predictions to df

df['Leaspy linear'] = float('nan')

for idx in df.index.unique('ID'):
    df.loc[idx, 'Leaspy linear'] = estimates[idx]

πŸ’¬ Question 40 πŸ’¬ Compute and add the new rmse

compute_rmse_train_test(df, overall_results, 'Leaspy linear')

overall_results
train test
Linear Regression 0.091403 0.102130
Individual Linear Regression 0.017825 0.032197
Linear Mixed Effect Model 0.024533 0.039367
Leaspy linear 0.018394 0.027894

πŸ’¬ Question 41 πŸ’¬ Display the subjects of sublist :

# Your code
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

plot_individuals(df, overall_results, 'Leaspy linear', sublist=sublist)
plt.show()
../../../_images/9bec8b61e54051d86231bc3215cb0dc03df6524f21f83ccd63b8fb8ffb8bc262.png

PART VII: A taste of the future - Non-linear mixed-effect model with Leaspy#

Again you will use an other model of leapsy…

πŸ’¬ Question 42 πŸ’¬ Fit the model with the data.

leaspy_univariate = Leaspy('univariate_logistic')
settings_fit = AlgorithmSettings('mcmc_saem', progress_bar=True, seed=0)
leaspy_univariate.fit(data, settings_fit)
 ==> Setting seed to 0
|##################################################|   10000/10000 iterations
The standard deviation of the noise at the end of the calibration is:
0.0214

Calibration took: 27s

πŸ’¬ Question 43 πŸ’¬ Run the following two cells to make the predictions:

settings_personalize = AlgorithmSettings('scipy_minimize', progress_bar=True, use_jacobian=True)

individual_parameters = leaspy_univariate.personalize(data, settings_personalize)
|##################################################|   200/200 subjects
The standard deviation of the noise at the end of the personalization is:
0.0183

Personalization scipy_minimize took: 4s
timepoints = {idx: df.loc[idx].index.get_level_values('TIME').values for idx in 
              df.index.get_level_values('ID').unique()}
estimates = leaspy_univariate.estimate(timepoints, individual_parameters)

πŸ’¬ Question 44 πŸ’¬ Add the predictions to df

df['Leaspy logistic'] = float('nan')

for idx in df.index.unique('ID'):
    df.loc[idx, 'Leaspy logistic'] = estimates[idx]

πŸ’¬ Question 45 πŸ’¬ Compute and add the new rmse

compute_rmse_train_test(df, overall_results, 'Leaspy logistic')

overall_results
train test
Linear Regression 0.091403 0.102130
Individual Linear Regression 0.017825 0.032197
Linear Mixed Effect Model 0.024533 0.039367
Leaspy linear 0.018394 0.027894
Leaspy logistic 0.018331 0.026822

πŸ’¬ Question 46 πŸ’¬ Display the subjects of sublist :

# Your code
# –––––––––––––––– #
# –––– Answer –––– #
# –––––––––––––––– #

plot_individuals(df, overall_results, 'Leaspy logistic', sublist=sublist)
plt.show()
../../../_images/b2b85402315718bc67cd41c62b6f1312f0d0ce8671ebbe4a50a2c219f04e2dc1.png

πŸ’¬ Question 47 πŸ’¬ Check the average RMSE per subject depending of the number of timepoints per subjects

models = ['Individual Linear Regression', 
          'Linear Mixed Effect Model - V2',
          'Leaspy linear',
          'Leaspy logistic']


plot_rmse_by_number_of_visit(models, df)
plt.show()
../../../_images/8b3e8bc32c332a8844a332b4b3c8adc802a64e22d1e6aed6ab7c5c5b9332180e.png

Here we clearly see that the few subjects who have less than 6 timepoints are better reconstructed with a mixed effect model!

For the fast-running Zebras: all-in-one with a more realistic split of data#

Let’s try to split differently data so to mimic real-world applications.

We don’t want to fit again a new model every time we predict the future of a new patient. To this aim, we want to calibrate (fit) a model on a large and representative dataset once and then personalize this model to totally new individuals. We can then make forecasts about these new individuals.

The previous split was not taking into account this constraint and this extra part will go through it.

  • Train part: all data from some individuals (used for calibration of model)

  • Test part: new individuals

    • Present data: partial data of these new individuals that will be known (used for personalization of model)

    • Future data: hidden data from these individuals (not known during personalization, used for prediction)

πŸ’¬ Question 48 πŸ’¬ Split data differently so to respect real-world constraint

# be sure that ages are increasing
df.sort_index(inplace=True)

individuals = df.index.unique('ID')
n_individuals = len(individuals)

# split on individuals
fraction = .75
individuals_train, individuals_test = individuals[:int(fraction*n_individuals)], individuals[int(fraction*n_individuals):]

s_train = df.droplevel(-1).loc[individuals_train][['PUTAMEN']]
s_test = df.droplevel(-1).loc[individuals_test][['PUTAMEN']]

# we split again test set in 2 parts: present (known) / future (to predict)
s_test_future = s_test.groupby('ID').tail(2) # 2 last tpts
s_test_present = s_test.loc[s_test.index.difference(s_test_future.index)]

s_test.loc[s_test_future.index, 'PART'] = 'future'
s_test.loc[s_test_present.index, 'PART'] = 'present'

s_test.set_index('PART',append=True,inplace=True)

# check no intersection in individuals between train/test
assert len(s_train.index.unique('ID').intersection(s_test.index.unique('ID'))) == 0

# check no intersection of (individual, timepoints) between test present/future
assert len(s_test_present.index.intersection(s_test_future.index)) == 0

# Leaspy Data objects creation
data_train = Data.from_dataframe(s_train)
data_test = {
    'all': Data.from_dataframe(s_test.droplevel('PART')), # all test data [present+future pooled]
    'present': Data.from_dataframe(s_test_present),
    'future': Data.from_dataframe(s_test_future)
}

Let’s check that distributions of ages and putamen values between train & test are similar… unlike previous split!

πŸ’¬ Question 49 πŸ’¬ Check consistence of new train/test split and compare to previous split

df_new_split = pd.concat({
    'Train': s_train,
    'Test': s_test.droplevel(-1)
}, names=['SPLIT'])

for which_split, df_split in {'Previous split': df, 'New split': df_new_split}.items():
    fig, axs = plt.subplots(1, 2, figsize=(14,6))
    fig.suptitle(which_split, fontsize=20, fontweight='bold')
    
    for (var, title), ax in zip({'TIME': 'Ages', 'PUTAMEN': 'Putamen value'}.items(), axs):

        sns.histplot(data=df_split.reset_index(), x=var, 
                     hue='SPLIT', stat='density', common_norm=False, 
                     kde=True, kde_kws=dict(bw_adjust=2.), ax=ax)
        ax.set_title(f'{title} - comparison of distributions')
        ax.set_xlabel(title)

    plt.show()

#sns.scatterplot(data=s_train.reset_index(), 
#                x='TIME', y='PUTAMEN', alpha=.5, s=60)
#sns.scatterplot(data=s_test.reset_index(), 
#                x='TIME', y='PUTAMEN', alpha=.5, s=60)
../../../_images/e71b719e0d9c85b4a26a30d508134625572c1ddee885abdaa59aac2b2311da23.png ../../../_images/60e6389bac214018ede3c9ba70998b019314b79c21d1064df34b15b0d3ff9935.png

πŸ’¬ Question 50 πŸ’¬ Double check number of individuals and visits in each split of our dataset

# Some descriptive stats on number of visits in test set (present part)

pd.options.display.float_format = '{:.1f}'.format

print('Visits in each split')
print({'Train': len(s_train), 
       'All test': len(s_test), 
       'Test - known part': len(s_test_present),
       'Test - to predict part': len(s_test_future)
      })
print()

print('Visits per individual in each split')
pd.concat({
    'Calibration': s_train.groupby('ID').size().describe(percentiles=[]), # min: 3 tpts known
    'Personalization': s_test_present.groupby('ID').size().describe(percentiles=[]), # min: 1 tpt known
    'Prediction': s_test_future.groupby('ID').size().describe(percentiles=[]), # min: 2 tpts to predict
}).unstack(0).rename({'count':'nb of individuals'}, axis=0)
Visits in each split
{'Train': 1499, 'All test': 498, 'Test - known part': 398, 'Test - to predict part': 100}

Visits per individual in each split
Calibration Personalization Prediction
nb of individuals 150.0 50.0 50.0
mean 10.0 8.0 2.0
std 2.5 2.5 0.0
min 3.0 1.0 2.0
50% 10.0 8.0 2.0
max 18.0 13.0 2.0

πŸ’¬ Question 51 πŸ’¬ Write a all-in-one personalization & prediction function thanks to Leaspy api

def personalize_model(leaspy_model, settings_perso):
    """
    all-in-one function that:
    1. takes a calibrated leaspy model, 
    2. personalizes it with different splits of data
    3. and estimates prediction errors compared to real data (including the hidden parts)
    """

    ips_test = {}
    rmse_test = {}

    # personalize on train part just to have a comparison (baseline)
    ips_test['train'], rmse_test['train'] = leaspy_model.personalize(data_train, settings_perso, 
                                                                     return_noise=True)
    print(f'RMSE on train: {100*rmse_test["train"]:.2f}%')

    # personalize using different test parts
    for test_part, data_test_part in data_test.items():
        ips_test[test_part], rmse_test[test_part] = leaspy_model.personalize(data_test_part, settings_perso, 
                                                                             return_noise=True)
        print(f'RMSE on test {test_part}: {100*rmse_test[test_part]:.2f}%')
    
    # reconstruct using different personalizations made
    s_train_ix = s_train.assign(PART='train').set_index('PART',append=True).index # with fake part added
    
    all_recons_df = pd.concat({
        test_part: leaspy_model.estimate(s_test.index if test_part != 'train' else s_train_ix, 
                                         ips_test_part)
        for test_part, ips_test_part in ips_test.items()
    }, names=['PERSO_ON']).reorder_levels([1,2,3,0])

    # compute prediction errors made
    true_vals_same_ix = all_recons_df[[]].join(df.droplevel(-1).iloc[:,0])
    pred_errs = all_recons_df - true_vals_same_ix
    
    # return everything that could be needed
    return pred_errs, all_recons_df, ips_test, rmse_test

πŸ’¬ Question 52 πŸ’¬ Calibrate, personalize & predict with LMM, univariate linear & univariate logistic thanks to Leaspy

## Leaspy LMM (using integrated exact personalization formula)

leaspy_lmm = Leaspy('lme')
settings_fit = AlgorithmSettings('lme_fit', 
                                 with_random_slope_age=True,
                                 force_independent_random_effects=True, # orthogonal rand. eff.
                                 seed=0)

# ------- Compute population parameters on train data ------- 
# (fixed effects, noise level and var-covar of random effects)
leaspy_lmm.fit(data_train, settings_fit)

# ------- Compute individual parameters (random effects) on different splits -------
pred_errs_lmm, all_recons_df_lmm, _, _  = personalize_model(leaspy_lmm, AlgorithmSettings('lme_personalize'))
 ==> Setting seed to 0
The standard deviation of the noise at the end of the calibration is: 0.0218
RMSE on train: 1.98%
RMSE on test all: 1.91%
RMSE on test present: 1.88%
RMSE on test future: 1.61%
## Leaspy univariate linear
leaspy_lin = Leaspy('univariate_linear')
settings_fit = AlgorithmSettings('mcmc_saem', n_iter=8000, progress_bar=True, seed=0)
settings_perso = AlgorithmSettings('scipy_minimize', progress_bar=True, use_jacobian=True)

# ------- Compute population parameters on train data ------- 
leaspy_lin.fit(data_train, settings_fit)

# ------- Compute individual parameters on different splits -------
pred_errs_lin, all_recons_df_lin, _, _  = personalize_model(leaspy_lin, settings_perso)
 ==> Setting seed to 0
|##################################################|   8000/8000 iterations
The standard deviation of the noise at the end of the calibration is:
0.0216

Calibration took: 22s
|##################################################|   150/150 subjects
The standard deviation of the noise at the end of the personalization is:
0.0194

Personalization scipy_minimize took: 4s
RMSE on train: 1.94%
|##################################################|   50/50 subjects
The standard deviation of the noise at the end of the personalization is:
0.0186

Personalization scipy_minimize took: 1s
RMSE on test all: 1.86%
|##################################################|   50/50 subjects
The standard deviation of the noise at the end of the personalization is:
0.0178

Personalization scipy_minimize took: 2s
RMSE on test present: 1.78%
|##################################################|   50/50 subjects
The standard deviation of the noise at the end of the personalization is:
0.0162

Personalization scipy_minimize took: 1s
RMSE on test future: 1.62%
## Leaspy univariate logistic
leaspy_log = Leaspy('univariate_logistic')

# Calibrate
leaspy_log.fit(data_train, settings_fit)

# Personalize
pred_errs_log, all_recons_df_log, _, _  = personalize_model(leaspy_log, settings_perso)
 ==> Setting seed to 0
|##################################################|   8000/8000 iterations
The standard deviation of the noise at the end of the calibration is:
0.0215

Calibration took: 30s
|##################################################|   150/150 subjects
The standard deviation of the noise at the end of the personalization is:
0.0194

Personalization scipy_minimize took: 5s
RMSE on train: 1.94%
|##################################################|   50/50 subjects
The standard deviation of the noise at the end of the personalization is:
0.0184

Personalization scipy_minimize took: 1s
RMSE on test all: 1.84%
|##################################################|   50/50 subjects
The standard deviation of the noise at the end of the personalization is:
0.0177

Personalization scipy_minimize took: 2s
RMSE on test present: 1.77%
|##################################################|   50/50 subjects
The standard deviation of the noise at the end of the personalization is:
0.0162

Personalization scipy_minimize took: 1s
RMSE on test future: 1.62%

πŸ’¬ Question 53 πŸ’¬ Display RMSE on all splits of data depending on models & part where models were personalized on

# reconstruct using different personalizations made
pd.options.display.float_format = '{:.5f}'.format

all_pred_errs = pd.concat({
    'lmm': pred_errs_lmm,
    'leaspy_linear': pred_errs_lin,
    'leaspy_logistic': pred_errs_log,
}, names=['MODEL'])

print('RMSE by:\n- model\n- part of data for personalization\n- part of data for reconstruction\n')

rmses_by_perso_split = (all_pred_errs**2).groupby(['MODEL','PERSO_ON','PART']).mean() ** .5
rmses_by_perso_split.unstack('MODEL').droplevel(0, axis=1).sort_index(ascending=[True,False]).rename_axis(index={'PART':'RECONS_ON'})
RMSE by:
- model
- part of data for personalization
- part of data for reconstruction
MODEL lmm leaspy_linear leaspy_logistic
PERSO_ON RECONS_ON
all present 0.01897 0.01841 0.01821
future 0.01944 0.01910 0.01913
future present 0.04730 0.04808 0.04923
future 0.01609 0.01618 0.01616
present present 0.01880 0.01776 0.01775
future 0.02692 0.02666 0.02538
train train 0.01979 0.01942 0.01941

πŸ’¬ Question 54 πŸ’¬ Plot distributions of absolute errors

def plot_dist_errs(pred_errs, model_name, grouping_reverse=False):
    plt.figure(figsize=(14,6))
    plt.grid(True,axis='both',zorder=-1)

    plot_opts = dict(
        x='PART', order=['train','present','future'],
        hue='PERSO_ON', hue_order=['all','present','future','train'],
    )
    x_lbl = 'Part of test set we look errors at'
    legend_lbl = 'Personalization on\nwhich test part?'
    
    if grouping_reverse:
        plot_opts = dict(
            hue='PART', hue_order=['train','present','future'],
            x='PERSO_ON', order=['all','present','future','train'],
        )
        x_lbl, legend_lbl = legend_lbl, x_lbl # swap
    
    sns.boxplot(data=pred_errs.abs().reset_index(),
                y='PUTAMEN', **plot_opts, showfliers=False)

    plt.ylabel(f'Distribution of {model_name} absolute errors', fontsize=14)
    
    plt.xlabel(x_lbl, fontsize=14)
    plt.legend().set_title(legend_lbl)
plot_dist_errs(pred_errs_lmm, 'LMM')
plot_dist_errs(pred_errs_lmm, 'LMM', True)
../../../_images/5090795fa3b6ab42f6c341d487a3b9b63070db8742380f25e8b039fc11ccc118.png ../../../_images/84c0a7698fd7be7023b3f91853c47bdc5d0bf6b4d0a68967b5d819eb2314febe.png

πŸ’¬ Question 55 πŸ’¬ Plot some individual trajectories with their associated predicted trajectories

idx_list = np.random.RandomState(5).choice(s_test.index.unique('ID'), 6).tolist()

def plot_recons(all_recons_df, model_name):

    recontruction_and_raw_data = pd.concat({'truth': s_test, 
                                            'prediction': all_recons_df.xs('present', level='PERSO_ON')
                                           }, names=['LABEL']).swaplevel(0,1).sort_index()

    plt.figure(figsize=(14,6))

    sns.lineplot(data=recontruction_and_raw_data.loc[idx_list].reset_index(), 
                 x='TIME', y='PUTAMEN', hue='ID', style='LABEL', style_order=['truth','prediction'])

    sns.scatterplot(data=s_test_future.loc[idx_list].reset_index(),  
                    x='TIME', y='PUTAMEN', hue='ID', legend=None)

    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')

    title = f'PUTAMEN - Raw data vs {model_name} (test known/to predict)'
    plt.title(title)
    plt.grid(True)
    plt.show()

plot_recons(all_recons_df_lmm, 'LMM')
#plot_recons(all_recons_df_lin, 'Leaspy linear (univ.)')
#plot_recons(all_recons_df_log, 'Leaspy logistic (univ.)')
../../../_images/460fa737105366249bc89cd17f167a02430436c5f3a7af14e9e9294311a7fb5f.png

Bonus: have a look at the LMM log-likelihood landscape#

import statsmodels.formula.api as smf
from tqdm import tqdm

import site
site.addsitedir('./utils')

from loglikelihood_landspace_lmm import plt_ll_landscape, plt_ll_landscape_

# new model (TIME_norm)
lmm_model = smf.mixedlm(formula="PUTAMEN ~ 1 + TIME_NORMALIZED", 
                        data=df.xs('train',level='SPLIT').reset_index(), 
                        groups="ID", re_formula="~ 1 + TIME_NORMALIZED")

lmm_model.cov_pen = None

view = dict(
    vmin=2200,
    vmid=2500, 
    vmax=2630, 
    levels=[2500, 2600, 2620]
)

Rs = {} # placeholder for results

πŸ’¬ Question 56 πŸ’¬ Plot log-likelihood landscapes for LMM model, depending on variance-covariance matrix of random effects

for fix_k, fix_v in tqdm([
    # corr fixed
    ('corr', -.99),
    ('corr', -.9),
    ('corr', -.75),
    ('corr', -.5),
    ('corr', -.25), # ~best
    ('corr', 0),
    ('corr', .5),
    ('corr', .75),
    ('corr', .99),
    # sx fixed
    ('sx', 1.),
    ('sx', 5),
    ('sx', 10), # ~best
    ('sx', 20),
    ('sx', 40),
    # sy fixed
    ('sy', 1),
    ('sy', 6), # ~best
    ('sy', 10),
    ('sy', 20),
]):
    if (fix_k, fix_v) in Rs:
        plt_ll_landscape_(Rs[(fix_k, fix_v)], **view)
    else:
        Rs[(fix_k, fix_v)] = plt_ll_landscape(lmm_model, N=50, **{fix_k: fix_v}, **view)
  0%|          | 0/18 [00:00<?, ?it/s]
../../../_images/18f08af69d5af8eeb6d90d603cc8613402032ecbdbee690600a1185abfb26b0b.png
  6%|β–Œ         | 1/18 [01:23<23:39, 83.51s/it]
../../../_images/cd324341d4802fcb08a361a649716e2029b03abd4e67fb9a8808a8983a16ebbb.png
 11%|β–ˆ         | 2/18 [02:58<24:06, 90.39s/it]
../../../_images/b3c9540ed66633eb617d396b0625926edb85f344b1aa23c7000927b712fbf7ab.png
 17%|β–ˆβ–‹        | 3/18 [04:43<24:16, 97.10s/it]
../../../_images/a4ad3cea0bceeac818ed3f05c86300095f2f5d9a127663f964c485ea54d19c16.png
 22%|β–ˆβ–ˆβ–       | 4/18 [06:01<20:53, 89.53s/it]
../../../_images/89cf9e6eb8e9a0bb0c7e3e67e389b20ea4752ef3ed43abc268f77a7c08260575.png
 28%|β–ˆβ–ˆβ–Š       | 5/18 [07:27<19:04, 88.05s/it]
../../../_images/d6557c639f7cb86472c4b38b57c49e807ec7b0aab295fee9a2e1a2c769ea11de.png
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 6/18 [08:53<17:31, 87.61s/it]
../../../_images/f5b75fbc27e4978147de745bf3b752e539c6b4f267e9068aedf0a8163e0c1b4f.png
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 7/18 [10:06<15:08, 82.59s/it]
../../../_images/7a34a6c300a4393435617a177a28b8b04e13efab405dceab3af123080d58e7a0.png
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 8/18 [11:19<13:17, 79.72s/it]
../../../_images/ca17d0a05a6436e0faeb0ab658a0b6acd9d4c2a50e3330b51e54476751db3f60.png
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 9/18 [12:31<11:35, 77.33s/it]/Users/etienne.maheux/Documents/repos/disease-course-mapping-solutions/TP1_LMM/utils/loglikelihood_landspace_lmm.py:130: UserWarning: No contour levels were found within the data range.
  colors='black', alpha=.8, linewidths=1)
../../../_images/24fe148a99a881e17b1d190f64b692fb9acba0835228f3a21bc6d7ec83dc5046.png
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 10/18 [13:44<10:07, 75.94s/it]
../../../_images/aaf0da588d749e5f54fc686d356ea143fa75259731ee1778e2b5dc6206f864d9.png
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 11/18 [14:58<08:46, 75.22s/it]
../../../_images/2867cb71644dded580f5b14c7bb4d06a47bcbbf9fccbeed92cb7a435b92b3460.png
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 12/18 [16:11<07:27, 74.57s/it]
../../../_images/e913208f0b0efb4621e87111e6ae8f93ee6b8571b4b8c4e99cc8676878c0aea7.png
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 13/18 [17:24<06:10, 74.09s/it]
../../../_images/ceaa3f618af386e108cb29eefab9219067712901c4ee374a7445fa925acfdb38.png
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 14/18 [18:38<04:56, 74.21s/it]
../../../_images/ee74ce09dcb4a027ee6814fb6765ad917f87bc5c1edc22533573875e82bd3773.png
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 15/18 [19:56<03:45, 75.18s/it]
../../../_images/ce39d57eb4d55f3e4d3bf55e5b252cf3caa5a6952940d8a8e8e184121e640784.png
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 16/18 [21:14<02:32, 76.14s/it]
../../../_images/fa6e10230996a8be5c097870e693be48aa4af5e885fb374c35db4f1354e37cb4.png
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 17/18 [22:27<01:15, 75.09s/it]
../../../_images/400cfb938dc8416692911dbdcf8cf54ac9d0863f88e4530a0bb94d0c702f373b.png
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 18/18 [23:52<00:00, 79.58s/it]