-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathJAGS_SIR2.Rmd
239 lines (180 loc) · 9.59 KB
/
JAGS_SIR2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
---
title: "A Bayesian SIR model in JAGS"
output: html_notebook
---
```{r, echo = FALSE, include= FALSE}
rm(list = ls())
set.seed(102)
```
### Introduction
In this notebook I am going to fit a Bayesian SIR model in JAGS. Some of this material is a simplification of the more advanced spatial SEIRS model found in this [thesis](https://ir.uiowa.edu/etd/1554/) by Grant Brown from 2015 which is well worth reading. Another very useful paper is [Lekone and Finkelstadt](https://onlinelibrary.wiley.com/doi/full/10.1111/j.1541-0420.2006.00609.x?casa_token=_3CKtdvUxAgAAAAA:74e1tIqSoCFQlE6k8WmaCjc1shBMMIe435o7uF36QLkPw-kPeJqzLwC4ZB0LCIpnh8cNDLM-O2ULZqQ) which has no spatial aspect, and is the majority of what is re-created here. I have changed the notation to what I believe is a clearer version.
The main idea is to simplify the differential equations of the SIR model into difference equations and then use probability distributions and transmission matrices to quantify uncertainties between compartments. All time is treated as discrete, which is fine for current COVID-19 data which is almost all available at a daily time step. In another document I will extend this model to SEIR, and further a spatial or stratified type model. It may be possible to fit a similar model in Stan but this would require integrating out many of the discrete parameters.
### Notation
First data:
- Let $t$ be time for $t = 1, \ldots, T$ with all time points known and equally spaced.
- Let $N$ be the population size
- Let $N_{S \rightarrow I}(t)$ be the number of susceptible people who get infected at time $t$. This, if known, is the number of cases at time $t$.
- Let $N_{I \rightarrow R}(t)$ be the number of infected people who are removed (died or recovered) at time $t$. We have a (likely reasonable) estimate of the number of people who have died at time $t$ (or at least before time $t$) but a much poorer estimate of the number who have recovered. The latter is reported by John Hopkins but likely to be a massive under-estimate, or based on quite strict assumptions of disease length.
Now latent variables:
- Let $S(t)$ be the number of susceptible individuals at time $t$ with starting value $S(1)$
- Let $I(t)$ be the number of infected individuals at time $t$ with starting value $I(1)$
- Let $R(t)$ be the number of removed individuals at time $t$ with starting value $R(1)$
Some parameters:
- $\beta$ is the base transmission rate, i.e. the rate at which susceptible individuals become infected
- $\gamma$ is a parameter that controls the rate at which infected individuals are removed. I think the inverse of this is the mean number of days before an infected individual becomes removed
Some extra parameters are derived:
- Let $R_0$ (or $R_0(t)$) be the basic reproduction number, possibly changing over time, that determines the expected number of people who turn susceptible from an infected person
- Let $\tau^*$ be the time point when the epidemic goes extinct, i.e. when there are no more infected individuals left
### Model
The model equations are given by:
$$S(t+1) = S(t) - N_{S \rightarrow I}(t)$$
$$I(t+1) = I(t) + N_{S \rightarrow I}(t) - N_{I \rightarrow R}(t)$$
$$S(t) + I(t) + R(t) = N$$
In English these equations mean:
- The number of susceptible individuals at time $t+1$ is equal to the previous number of susceptible individuals less the number of people who were susceptible but now infected at time $t$
- The number of infected people at time $t+1$ is the number of infected people at time $t$ plus the number of people who move from susceptible to infected at time $t$ less the number of removed individuals who have moved from exposed to removed.
- The total number of people in each compartment is always equal to the population size
Each of the latent variables is given a binomial distribution
$$N_{S \rightarrow I}(t) \sim Bin(S(t), p_{S \rightarrow I}(t))$$
$$N_{I \rightarrow R}(t) \sim Bin(I(t), p_{I \rightarrow R})$$
In words these equations mean:
- The number of susceptible people who get infected at time $t$ is a binomial distribution with maximum value $S(t)$ and time-dependent probability $p_{S \rightarrow I}(t)$. This means that the number of people moving from susceptible to infected at time $t$ has to be a fraction of the total number of susceptible individuals
- The number of infected individuals who get removed at time $t$ is a binomial distribution with maximum value $I(t)$ and probability $p_{I \rightarrow R}$. This means that the number of people who get removed at time $t$ is a fraction of the number of people who are infected at time $t$
These probabilities are set at
$$p_{S \rightarrow I}(t) = 1 - \exp \left( - \beta \right)$$
This equation makes the proportion of people moving from susceptible to infected dependent on the proportion of people who are infected ($I(t)/N$), i.e. the greater the proportion of people infected, the fewer can become infected.
$$p_{I \rightarrow R} = 1 - \exp( - \gamma)$$
where
- $\beta$ is the transmission rate
- $1/\gamma$ is the mean infectious period
One of the key parameters to estimate is the basic reproduction number which is defined as $R_0$ or $\beta / \gamma$ in this notation. This model has the cool feature that we can estimate a time dependent reproduction number as:
$$ R_0(t) = \frac{\beta}{\gamma}$$
Ideally, the data for this model would be $N, N_{S \rightarrow I}(t), N_{I \rightarrow R}(t)$, the latter three for all time values. In fact we often do not have access to $N_{S \rightarrow I}(t)$ (the number of susceptible individuals who become infected) so we may need to integrate out these values during the Bayesian model.
### JAGS model
Below is some code for a JAGS model:
```{r}
jags_code = '
model {
# Likelihood
for (t in 1:T) {
N_I_R[t] ~ dbinom(p_I_R, I[t])
N_S_I[t] ~ dbinom(p_S_I, S[t])
}
# These are the known time evolution steps:
for(t in 2:T) {
S[t] <- S[t-1] - N_S_I[t-1]
I[t] <- I[t-1] + N_S_I[t-1] - N_I_R[t-1]
R[t] <- N - S[t] - I[t]
}
# Need a value for S[1] and I[1]
I[1] <- I_start # Assume number of infected people at start
R[1] <- R_start # As above but for removed
S[1] <- N - I[1] - R[1] # Left over
# Probabilities and R_0
p_S_I <- 1 - exp( - beta )
p_I_R <- 1 - exp( -gamma )
R_0 <- (beta/gamma)
# Now the prior on the hyper-parameters
beta ~ dunif(0.3, 1) #= 0.6
gamma <- 1/gamma_inv
gamma_inv ~ dunif(6, 12) # 10
# Can now forecast into the future
for (t in (T+1):T_max) {
# Transitions
N_I_R[t] ~ dbinom(p_I_R, I[t])
N_S_I[t] ~ dbinom(p_S_I, S[t])
# Compartment values
S[t] <- S[t-1] - N_S_I[t-1]
I[t] <- I[t-1] + N_S_I[t-1] - N_I_R[t-1]
R[t] = N - S[t] - I[t]
}
}
'
```
### Fit to simulated data
I will first try simulating some data from this model and checking that JAGS actually works.
```{r}
# Summary values
N = 1000 # Population size
T = 30 # Maximum time steps
t = 1:T
# First the hyper-parameters
gamma_inv = 10; gamma = 1/gamma_inv
beta = 0.6
# Now the probabilities
p_I_R = 1 - exp(-gamma)
p_S_I = 1 - exp(-beta)
# Give an initial values for everthing required
N_S_I = N_I_R = S = I = R = rep(NA, T)
S[1] = N - 10; I[1] = 10; R[1] = 0
N_S_I[1] = rbinom(1, S[1], p_S_I)
N_I_R[1] = rbinom(1, I[1], p_I_R)
# Now can loop through filling in the other values
for (t in 2:T) {
S[t] <- S[t-1] - N_S_I[t-1]
I[t] <- I[t-1] + N_S_I[t-1] - N_I_R[t-1]
R[t] = N - S[t] - I[t]
# Now get the actual values
N_I_R[t] = rbinom(1, I[t], p_I_R)
# These are usually imputed:
N_S_I[t] = rbinom(1, S[t], p_S_I)
}
# Create R0
R_0 = (beta/gamma)
# Create a plot of the epidemic
library(ggplot2)
library(tidyr)
tibble(t = 1:T,
S, I, R) %>%
pivot_longer(names_to = 'Compartment', values_to = 'People', -t) %>%
ggplot(aes(x = t, y = People, colour = Compartment)) +
#scale_y_log10() +
geom_line()
```
Now we can fit this model by providing the data to JAGS
```{r, message= FALSE, results='hide', warning=FALSE}
N_future = 50 # Forecast X days into the future
T_max = T + N_future
jags_data = list(N = N,
T = T,
T_max = T_max,
I_start = I[1],
R_start = R[1],
N_S_I = c(N_S_I, rep(NA, N_future)), # You can comment out this line if you want it to not have the number of cases.
N_I_R = c(N_I_R, rep(NA, N_future)))
library(R2jags)
jags_run = jags(data = jags_data,
parameters.to.save = c("gamma_inv", "beta",
"S","I", "R", "R_0"),
model.file = textConnection(jags_code))
plot(jags_run)
```
Now plot the number of infected over time:
```{r}
S_post = jags_run$BUGSoutput$median$S
I_post = jags_run$BUGSoutput$median$I
R_post = jags_run$BUGSoutput$median$R
tibble(t = 1:T_max,
S_post, I_post, R_post) %>%
pivot_longer(names_to = 'Compartment', values_to = 'People', -t) %>%
ggplot(aes(x = t, y = People, colour = Compartment)) +
geom_line()
```
Now plot the parameter estimates against their true values
```{r}
post = jags_run$BUGSoutput$sims.list
tibble(iter = 1:length(post$beta),
beta = post$beta,
gamma_inv = post$gamma_inv,
R0 = post$R_0) %>%
pivot_longer(names_to = 'Type', values_to = 'Sample',-iter) %>%
ggplot(aes(x = Sample, fill = Type)) +
geom_histogram(bins = 30) +
facet_wrap(~ Type, scales = 'free') +
geom_vline(data = tibble(iter = rep(1, 3),
Sample = c(beta, gamma_inv, R_0),
Type = c('beta', 'gamma_inv', 'R0')),
aes(xintercept = Sample)) +
theme(legend.position = 'None')
```
All values seem to match, depending on whether you give it the number of cases ($N_{S \rightarrow I}$) or not.