As an alternative to linear regression, using a gaussian processes one can predict a distribution for each input, allowing for confidence levels instead of a hard prediction. Here, we use the GPyTorch library to model patient vital sign data from the 2019 Physionet Sepsis Competition: https://physionet.org/content/challenge-2019/1.0.0/

Note: Much of this tutorial is from https://docs.gpytorch.ai/en/v1.1.1/examples/01_Exact_GPs/Simple_GP_Regression.html, but I wanted to document my own use case. This is not an introduction to guassian processes, but a tutorial on how to apply GPyTorch.

Motivation: A formidiable obstacle to modeling health data is that it is very sparse. The application in mind for using a guassian process to model the patient's vital signs are for imputation the of missing values.

We'll be using a random patient from the sepsis data provided by https://physionet.org/content/challenge-2019/1.0.0/ .

Data

Let's check out patient 57779:

df = pd.read_csv('p005779.psv', sep='|')

df.head()

df[['HR','O2Sat', 'Temp', 'SBP', 'MAP', 'DBP', 'Resp']].isnull().sum()

HR        1
O2Sat     1
Temp     23
SBP       1
MAP       1
DBP      24
Resp      1
dtype: int64

Temperature is a feature that has a high number of missing values, so we'll use that for modeling.

Our data type for GPyTorch models are torch tensors.

temp = torch.tensor(df.Temp.values, dtype=torch.float32)

With missing values our training x and y points will be need to be chosen as:

x = index values where there is non null values
y = observed temperature values that are non null

train_x = torch.where(~torch.isnan(temp))[0]
train_x

tensor([ 0,  2,  4,  8, 12, 16, 20, 24, 28, 32])

train_y = temp[~torch.isnan(temp)]
train_y

tensor([36.7200, 36.5000, 37.0600, 37.0600, 36.8300, 37.0000, 37.6700, 37.7200,
        37.2200, 37.0600])

Model

Similiary to Pytorch, you must define a class for each model you construct. With GPyTorch, there are 5 objects you must specify:

GP model: gpytorch.models.ExactGP
Likelihood: gpytorch.likelihoods.GaussianLikelihood()
Mean: gpytorch.means.ConstantMean()
Kernal: gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
Distribution: gpytorch.distributions.MultivariateNormal()

These are all the "default" settings for a gaussian process. If you had a choice to tweak one, choose the kernal. A good place to learn more is at: https://www.cs.toronto.edu/~duvenaud/cookbook/

class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood)

Training loop

Similar to PyTorch models, you can train your model by specifying an optimizer and loss, then updating the gradients at each step.

training_iter = 10000


# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=0.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    if i % 1000 == 0:
        print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
            i, training_iter, loss.item(),
            model.covar_module.base_kernel.lengthscale.item(),
            model.likelihood.noise.item()
        ))
    optimizer.step()

Iter 0/10000 - Loss: 495.607   lengthscale: 0.693   noise: 0.693
Iter 1000/10000 - Loss: 4.124   lengthscale: 13.291   noise: 3.872
Iter 2000/10000 - Loss: 1.933   lengthscale: 14.903   noise: 2.703
Iter 3000/10000 - Loss: 0.885   lengthscale: 16.011   noise: 0.128
Iter 4000/10000 - Loss: 0.793   lengthscale: 17.954   noise: 0.070
Iter 5000/10000 - Loss: 0.744   lengthscale: 20.278   noise: 0.073
Iter 6000/10000 - Loss: 0.673   lengthscale: 22.807   noise: 0.076
Iter 7000/10000 - Loss: 0.416   lengthscale: 23.878   noise: 0.089
Iter 8000/10000 - Loss: 0.386   lengthscale: 18.951   noise: 0.089
Iter 9000/10000 - Loss: 0.318   lengthscale: 8.548   noise: 0.071

Now we get out predictions from the model:

# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# Make predictions by feeding model through likelihood
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    test_x = torch.tensor(list(df.index))
    observed_pred = likelihood(model(test_x))

Plots

The GPytorch tutorial provides matplotlib code for plotting the mean and confidence itervals of our posterior:

with torch.no_grad():
    # Initialize plot
    f, ax = plt.subplots(1, 1, figsize=(8, 6))

    # Get upper and lower confidence bounds
    lower, upper = observed_pred.confidence_region()
    # Plot training data as black stars
    ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
    # Plot predictive means as blue line
    ax.plot(test_x.numpy(), observed_pred.mean.numpy(), 'b')
    # Shade between the lower and upper confidence bounds
    ax.fill_between(test_x.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
    ax.legend(['Observed Data', 'Mean', 'Confidence'])

But we can do better. PyMC3 has plotting functionality that allows us to sample from our posterior and plot the results.

from pymc3.gp.util import plot_gp_dist

gp_samples = torch.stack([observed_pred.sample() for i in range(100)])

fig, ax = plt.subplots(figsize=(8,6))
plot_gp_dist(ax, gp_samples.numpy(), test_x.numpy())
df.reset_index().plot(kind='scatter', x='index', y='Temp', c='k', s=50, ax=ax)

<matplotlib.axes._subplots.AxesSubplot at 0x7ff47c8b9710>

With just 100 samples we get a clear picture of where the guassian process mean fits thorugh out data. While not just looking cool, graphing in this way helps to illustrate how the model is generated from a posterior distribution.

	HR	O2Sat	Temp	SBP	MAP	DBP	Resp	EtCO2	BaseExcess	HCO3	...	WBC	Fibrinogen	Platelets	Age	Gender	Unit1	Unit2	HospAdmTime	ICULOS
0	76.0	100.0	36.72	125.0	91.0	76.0	14.0	NaN	NaN	NaN	...	NaN	NaN	NaN	33.11	1	NaN	NaN	-0.03	3
1	80.0	100.0	NaN	110.0	74.0	64.0	12.0	NaN	NaN	NaN	...	NaN	NaN	NaN	33.11	1	NaN	NaN	-0.03	4
2	87.0	100.0	36.50	103.0	75.0	62.0	15.5	NaN	-2.0	24.0	...	13.6	245.0	156.0	33.11	1	NaN	NaN	-0.03	5
3	93.0	100.0	NaN	99.0	73.0	63.0	16.0	NaN	-3.0	NaN	...	NaN	NaN	NaN	33.11	1	NaN	NaN	-0.03	6
4	87.0	100.0	37.06	125.0	57.0	74.0	14.0	NaN	NaN	NaN	...	NaN	NaN	NaN	33.11	1	NaN	NaN	-0.03	7