As for the analysis of the prophylactic AMU, the main difficulty here consists in rearranging the data. Once the data is properly rearranged, then the analysis is pretty classical. In this section we sketch out the rationale of our analysis.
Because we are interested in therapeutic AMU, we will consider in this analysis only weeks that are from disease episodes. A first challenge will thus consist in characterizing disease episodes in terms of a set of clinical signs and also in terms of gap
(see the functions fill_gaps()
and split_by_disease_episode()
below), as explained in Figure 1 below.
Figure 1: Defining a disease episode. The arrow represents the weeks (each rectangle is a week) of a flock, from the first week on the left ot the last one on the right. The red dots represent the presence of disease, as defined by a set of clinical signs (where here we consider there is disease if at least one of the clinical signs of the set is reported). In order to account for the fact that symptoms may not be always reported, we want to allow the possibility to convert one or a few weeks without reported symptoms and surrounded by weeks with reported symptoms into one single disease episode. The parameter gap
is the number of week(s) without symptoms we allow ourself to skip when defining a disease episode. Below the arrow are 3 examples of disease episodes definitions with 3 episodes when gap = 0
, 2 episodes when gap = 1
and 1 episode only when gap = 3
.
Once this is done, the next step consists in splitting each disease episode into 2 periods: one before AMU (defined by a set of antimicrobials) and one after AMU (Figure 2). Same as for the symptoms: here we consider there is AMU if at least one of the drugs set that we consider is reported. The exact week where AMU starts is excluded from the analysis because for this particular week it’s impossible to infer any form of causality. For some disease episodes there might be no AMU at all. On the contrary, for some other disease episodes, there might be AMU from the beginning of the episode. In those cases, we will have only a before
or an after
period. Also, because we really want to ascertain that our AMU can be considered as therapeutic, will exclude from the analysis all the disease episodes that had AMU during the x
weeks before the start of the episode. This is what is done by the function exclude_if_amu_before()
defined below.
Figure 2: Separating the weeks before from the weeks after AMU in all disease episode. In this example, the arrows show the first 2 and the last flocks of our data. Each flock starts on the left end and ends on the right end of the arrow and the length of the arrow is the duration of the flock. Colored sections represent disease episodes as identified in Figure 1. The red rectangles represent the first week of AMU in the disease episodes. Sometimes there is no AMU at all during the disease episode (as on the third episode of the first flock) and some other times the firt week of AMU is the first week of the episode as on the second episode of the second flock or the first episode of the last flock. Once these first weeks of AMU are identified in all the disease episodes, we gather, from all the disease episodes of all the flocks, all the weeks that occur before (in blue) and all the weeks that occur after (in green) these first weeks of AMU, making 2 arms: “before” and “after”.
Once this is done, we end up with two arms of weeks: one before
and one after
AMU (Figure 2). We can then estimate and compare the mortality rates in these two arms with a logistic regression as detailed below. The great advantage of a logistic regression is that it allows to include covariables in order to control for potential confounding effects. For the purpose of the examples below, we will use the following sets of clinical signs to define disease episodes:
cs_set <- c("malaise", "diarrhoea")
with this value of gap
:
gap <- 1
and the following set of antimicrobials to defined AMU:
am_set <- c("colistin_use", "oxytetracycline_use")
and the following value of x
:
x <- 1
but the function prepare_data()
at the end is a pipeline that puts everything together, with cs_set
, am_set
, gap
and x
as tuning parameters. The last section shows the different ways (basically with and without correcting for potential condounding effects) to performed the statistical analysis. The user can thus prepare a number of different data sets, depending on the values of the cs_set
, am_set
, gap
and x
arguments and, on each of these data sets, perform a number of analyses depending on whether (s)he wants to correct for potential confounding effects, and which ones.
Installing the required packages:
required <- c("dplyr", "magrittr", "purrr", "readr")
to_install <- setdiff(required, row.names(installed.packages()))
if (length(to_install)) install.packages(to_install)
Loading magrittr
for interactive use:
library(magrittr)
viparc <- readr::read_csv("https://raw.githubusercontent.com/viparc/prophylactic/master/data/viparc_qualitative.csv",
col_types = paste(c("ciiiddd", rep("l", 51)), collapse = ""))
We have 4 types of variables in this data frame:
farm
, flock
and week
;nb_chicken
, nb_chicken_sold
, chicken_disease_death
and chicken_sudden_death
;respiratory
, diarrhoea
, cns
, malaise
, leg_lesions
and sudden_death
;amoxicillin_use
to unknown_use
.The function fill_gaps()
takes a vector of logicals (or zeros and ones) and replaces all the 1 0 1
sequences by 1 1 1
(when gap = 1
), or all 1 0 0 1
sequences by 1 1 1 1
(when gap = 2
), etc…:
fill_gaps_0 <- function(x, gap) {
as.integer(x) %>% # because easier to deal with 1-character encoding
paste(collapse = "") %>%
gsub(paste(c(1, rep(0, gap), 1), collapse = ""),
paste(rep(1, gap + 2), collapse = ""), .) %>%
gsub("NA", "x", .) %>% # need to replace potential NA by 1-character...
strsplit("") %>% # ... because of the strsplit() call that follows
unlist(FALSE) %>% # because strsplit() returns a list
sub("x", NA, .) %>% # back to NA
as.numeric() %>%
as.logical()
}
Let’s try it:
fill_gaps_0(c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE), 2)
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
Works fine but it doesn’t replace the gaps that are shorter than the gap
argument:
fill_gaps_0(c(TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE), 2)
## [1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
fill_gaps()
is a wrapper around fill_gaps_0()
that fixes this problem:
fill_gaps <- function(x, gap) {
while(gap > 0) {
x <- fill_gaps_0(x, gap)
gap <- gap - 1
}
x
}
Let’s try it:
fill_gaps(c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE), 2)
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
and:
fill_gaps(c(TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE), 2)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
The function episodes()
identifies the different disease episodes. It would for example transform 1 1 1 0 0 1 1 0 0 0 1 0
into 1 1 1 0 0 2 2 0 0 0 3 0
:
episodes <- function(x) {
x <- rle(x)
sel <- which(x$values > 0)
x$values[sel] <- seq_along(sel)
inverse.rle(x)
}
Let’s try it:
episodes(c(1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0))
## [1] 1 1 1 0 0 2 2 0 0 0 3 0
The function split_by_disease_episode()
uses the function episodes()
to generate, for a given flock, disease episodes according to a set of clinical signs cs
, and splits by row the data frame into a list of data frames, each corresponding to a disease episode:
split_by_disease_episode <- function(df, cs, gap) {
if (any(unique(diff(df$week))) != 1) return(NULL) # makes sure there is no missing week
df %>%
dplyr::mutate(disease = dplyr::select(., !! cs) %>%
rowSums() %>%
as.logical() %>%
fill_gaps(gap) %>%
as.integer() %>%
episodes()) %>%
dplyr::filter(disease > 0) %>%
split(., .$disease) %>%
lapply(dplyr::select, -disease)
}
Let’s try it:
viparc %>%
dplyr::filter(farm == "75-004", flock == 1) %>%
dplyr::arrange(week) %>%
split_by_disease_episode(cs_set, gap) %>%
lapply(dplyr::select, farm, flock, week, !! cs_set) # for simpler printing
## $`1`
## # A tibble: 1 x 5
## farm flock week malaise diarrhoea
## <chr> <int> <int> <lgl> <lgl>
## 1 75-004 1 3 TRUE FALSE
##
## $`2`
## # A tibble: 2 x 5
## farm flock week malaise diarrhoea
## <chr> <int> <int> <lgl> <lgl>
## 1 75-004 1 7 TRUE TRUE
## 2 75-004 1 8 TRUE TRUE
Works for one given cycle. Here is how we make it works for all cycles:
a <- viparc %>%
dplyr::arrange(week) %>%
dplyr::group_by(farm, flock) %>%
dplyr::group_map(~ split_by_disease_episode(.x, cs_set, gap), keep = TRUE) %>%
unlist(FALSE)
which gives:
a %>%
lapply(dplyr::select, farm, flock, week, !! cs_set) %>%
head()
## $`1`
## # A tibble: 3 x 5
## farm flock week malaise diarrhoea
## <chr> <int> <int> <lgl> <lgl>
## 1 75-001 6 1 TRUE FALSE
## 2 75-001 6 2 FALSE FALSE
## 3 75-001 6 3 TRUE FALSE
##
## $`1`
## # A tibble: 3 x 5
## farm flock week malaise diarrhoea
## <chr> <int> <int> <lgl> <lgl>
## 1 75-002 1 1 TRUE FALSE
## 2 75-002 1 2 FALSE FALSE
## 3 75-002 1 3 TRUE FALSE
##
## $`2`
## # A tibble: 1 x 5
## farm flock week malaise diarrhoea
## <chr> <int> <int> <lgl> <lgl>
## 1 75-002 1 5 TRUE FALSE
##
## $`1`
## # A tibble: 1 x 5
## farm flock week malaise diarrhoea
## <chr> <int> <int> <lgl> <lgl>
## 1 75-002 2 4 TRUE FALSE
##
## $`2`
## # A tibble: 1 x 5
## farm flock week malaise diarrhoea
## <chr> <int> <int> <lgl> <lgl>
## 1 75-002 2 12 TRUE FALSE
##
## $`1`
## # A tibble: 1 x 5
## farm flock week malaise diarrhoea
## <chr> <int> <int> <lgl> <lgl>
## 1 75-002 3 1 TRUE FALSE
Before we continue, we want to exclude from this list all the disease episodes with AMU during the \(x\) weeks before the start of the episode, AMU being, again, defined by a set of antimicrobials.
exclude_if_amu_before <- function(df, x, df_ref, ab) {
require(magrittr)
am <- ab
first_week <- min(df$week)
if (first_week == 1) return(df)
weeks <- (first_week - x):(first_week - 1)
weeks <- weeks[weeks > 0]
amu_before_episode <- df_ref %>%
dplyr::filter(farm == df$farm[1], flock == df$flock[1], week %in% weeks) %>%
dplyr::select({{ am }}) %>%
rowSums() %>%
as.logical() %>%
any()
if (amu_before_episode) return(NULL)
df
}
where df
is a sub data frame of the viparc
data frame, x
is the duration, in weeks, of the period before the start of the disease episode we consider to filter out according to AMU, df_ref
is typically the viparc
data frame, and ab
is a set of antimicrobials used to define AMU. Let’s try it:
exclude_if_amu_before(a[[1]], 2, viparc, am_set)
## # A tibble: 3 x 58
## farm flock week nb_chicken nb_chicken_sold chicken_disease… chicken_sudden_… respiratory diarrhoea cns malaise leg_lesions
## <chr> <int> <int> <int> <dbl> <dbl> <dbl> <lgl> <lgl> <lgl> <lgl> <lgl>
## 1 75-0… 6 1 195 9 9 0 FALSE FALSE FALSE TRUE FALSE
## 2 75-0… 6 2 195 0 0 0 FALSE FALSE FALSE FALSE FALSE
## 3 75-0… 6 3 190 5 5 0 FALSE FALSE FALSE TRUE FALSE
## # … with 46 more variables: sudden_death <lgl>, amoxicillin_use <lgl>, ampicillin_use <lgl>, apramycin_use <lgl>, cefadroxil_use <lgl>,
## # cefotaxime_use <lgl>, ceftiofur_use <lgl>, cephalexin_use <lgl>, chloramphenicol_use <lgl>, ciprofloxacin_use <lgl>,
## # colistin_use <lgl>, doxycycline_use <lgl>, enramycin_use <lgl>, enrofloxacin_use <lgl>, erythromycin_use <lgl>,
## # florfenicol_use <lgl>, flumequine_use <lgl>, gentamicin_use <lgl>, josamycin_use <lgl>, kitasamycin_use <lgl>, lincomycin_use <lgl>,
## # marbofloxacin_use <lgl>, methenamine_use <lgl>, neomycin_use <lgl>, norfloxacin_use <lgl>, oxytetracycline_use <lgl>,
## # spectinomycin_use <lgl>, spiramycin_use <lgl>, streptomycin_use <lgl>, sulfachloropyridazine_use <lgl>, sulfadiazine_use <lgl>,
## # sulfadimethoxine_use <lgl>, sulfadimidine_use <lgl>, sulfaguanidin_use <lgl>, sulfamethazine_use <lgl>, sulfamethoxazole_use <lgl>,
## # sulfamethoxypyridazine_use <lgl>, sulphamethoxazole_use <lgl>, sulphathiazole_use <lgl>, tetracycline_use <lgl>,
## # thiamphenicol_use <lgl>, tiamulin_use <lgl>, tilmicosin_use <lgl>, trimethoprim_use <lgl>, tylosin_use <lgl>, unknown_use <lgl>
The function before_after()
below splits the data frame of a disease episode into 2 data frames, one before the use of AMU and one after the use of AMU. The week where AMU starts is deleted. AMU is defined in terms of a set of antimicrobials.
before_after <- function(df, ab) {
require(magrittr)
am <- ab
start <- df %>%
dplyr::select({{ am }}) %>%
rowSums() %>%
as.logical() %>%
which() %>%
head(1)
if (length(start) < 1) return(list(df, NULL)) # there is never AMU
if (start < 2) return(list(NULL, df[-1, ])) # AMU starts on the first week
nr <- nrow(df)
if (start == nr) return(list(df[-nr, ], NULL)) # AMU starts on the last week
list(df[1:(start - 1), ], df[(start + 1):nr, ])
}
Let’s try it:
b <- lapply(a, before_after, am_set)
A pipeline that assembles the above defined split_by_disease_episode()
, exclude_if_amu_before()
and before_after()
functions in order to prepare the data for analysis:
prepare_data <- function(df, cs = NULL, am = NULL, gap = 1, x = 1) {
require(magrittr)
var_names <- setdiff(names(df), c("arm", "farm", "flock", "week", "nb_chicken",
"nb_chicken_sold", "chicken_disease_death",
"chicekn_sudden_death"))
if (is.null(cs)) cs <- grep("_use$", var_names, value = TRUE, invert = TRUE)
if (is.null(am)) am <- grep("_use$", var_names, value = TRUE)
df %>%
dplyr::arrange(week) %>%
dplyr::group_by(farm, flock) %>%
dplyr::group_map(~ split_by_disease_episode(.x, cs, gap), keep = TRUE) %>%
unlist(FALSE) %>%
lapply(exclude_if_amu_before, x, viparc, am) %>%
purrr::compact() %>% # because exclude_if_amu_before() potentially generates NULL elements
lapply(before_after, am_set) %>%
purrr::transpose() %>%
lapply(purrr::compact) %>% # because before_after() potentially generates NULL elements
setNames(c("before", "after")) %>%
lapply(dplyr::bind_rows) %>%
dplyr::bind_rows(.id = "arm") %>%
dplyr::mutate(arm = factor(arm, c("before", "after")),
suc = chicken_disease_death + chicken_sudden_death,
fai = nb_chicken + nb_chicken_sold)
}
Let’s try it (takes a few seconds to run):
viparc2 <- prepare_data(viparc,
c("malaise", "diarrhoea"),
c("colistin_use", "oxytetracycline_use"), 1, 2)
In this particular case, it produces the following numbers of weeks in the two arms:
table(viparc2$arm)
##
## before after
## 491 353
Once the data is properly put in shape, the actual statistical analysis is quite classical. Here, we consider a logistic regression:
(mod <- glm(cbind(suc, fai) ~ arm, binomial, viparc2))
##
## Call: glm(formula = cbind(suc, fai) ~ arm, family = binomial, data = viparc2)
##
## Coefficients:
## (Intercept) armafter
## -2.5677 -0.9455
##
## Degrees of Freedom: 843 Total (i.e. Null); 842 Residual
## Null Deviance: 38000
## Residual Deviance: 34870 AIC: 37870
Note that, in this model, if you wish to correct for potential counfounding effects such as chicken age or the use of other antimicrobials or the presence of other clinical signs or AMU during the first stages of the flock, you will have to include the corresponding variables in the formula of the model, making sure that the arm
variable is the last one. You can explore the model summary:
summary(mod)
##
## Call:
## glm(formula = cbind(suc, fai) ~ arm, family = binomial, data = viparc2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -15.213 -4.800 -2.531 0.519 38.756
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.567715 0.008977 -286.0 <2e-16 ***
## armafter -0.945539 0.018010 -52.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 37999 on 843 degrees of freedom
## Residual deviance: 34871 on 842 degrees of freedom
## AIC: 37875
##
## Number of Fisher Scoring iterations: 5
After AMU, the odds of dying is multiplied by the following factor:
unname(exp(tail(coef(mod), 1)))
## [1] 0.3884701
with the following confidence interval:
mod %>%
confint() %>%
`[`(., nrow(.), ) %>%
exp()
## 2.5 % 97.5 %
## 0.3749583 0.4023886
The significativity of the therapeutic AMU is:
anova(mod, test = "LRT")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: cbind(suc, fai)
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 843 37999
## arm 1 3128.7 842 34871 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In absence of any covariable (age of the flock, etc…), you can also compute the death rates for the two arms (before
and after
) straightforwardly thanks to the following function:
death_rate <- function(m) {
require(magrittr)
m %>%
predict.glm(se.fit = TRUE) %>%
lapply(unique) %>%
head(2) %>%
purrr::transpose() %>%
lapply(function(x) c(x$fit, x$fit + c(-1, 1) * 1.96 * x$se.fit)) %>%
lapply(m$family$linkinv) %>%
setNames(c("before", "after")) %>%
lapply(function(x) setNames(as.data.frame(t(x)), c("estimate", "lower", "upper"))) %>%
dplyr::bind_rows(.id = "arm")
}
Let’s try it:
death_rate(mod)
## arm estimate lower upper
## 1 before 0.07124534 0.07008984 0.07241841
## 2 after 0.02893745 0.02808981 0.02980989
These “rates” are the individual probability of death per week. There may be the possibility to compute similar things accounting for covariates but I need to do some research on that. We can however verify that:
0.02989321 / (1 - 0.02989321) / (0.07742014 / (1 - 0.07742014))
## [1] 0.3672003
And here the probability ratio is
0.02989321 / 0.07742014
## [1] 0.3861167
The only covariable that seems to make sense to include is the age of the chicken:
(mod <- glm(cbind(suc, fai) ~ week + arm, binomial, viparc2))
##
## Call: glm(formula = cbind(suc, fai) ~ week + arm, family = binomial,
## data = viparc2)
##
## Coefficients:
## (Intercept) week armafter
## -3.09268 0.06406 -0.83878
##
## Degrees of Freedom: 843 Total (i.e. Null); 841 Residual
## Null Deviance: 38000
## Residual Deviance: 33390 AIC: 36400
anova(mod, test = "LRT")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: cbind(suc, fai)
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 843 37999
## week 1 2271.4 842 35728 < 2.2e-16 ***
## arm 1 2338.0 841 33390 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Alternatively, this can be modeled with a GAM:
(mod <- mgcv::gam(cbind(suc, fai) ~ s(week) + arm, binomial, viparc2, method = "REML"))
##
## Family: binomial
## Link function: logit
##
## Formula:
## cbind(suc, fai) ~ s(week) + arm
##
## Estimated degrees of freedom:
## 8.9 total = 10.9
##
## REML score: 17747.96
summary(mod)
##
## Family: binomial
## Link function: logit
##
## Formula:
## cbind(suc, fai) ~ s(week) + arm
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.639218 0.009666 -273.05 <2e-16 ***
## armafter -0.849133 0.018603 -45.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(week) 8.898 8.995 2373 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0882 Deviance explained = 14.7%
## -REML = 17748 Scale est. = 1 n = 844
As done for the prophylactic study, let’s define the generic and methods get_outputs()
:
get_outputs <- function (x, ...) {
UseMethod("get_outputs", x)
}
for the generic and
get_outputs.glm <- function(x) {
conf_int <- confint(x)
c(n = nrow(x$model),
or = tail(coef(x), 1),
p = tail(anova(x, test = "LRT")$`Pr(>Chi)`, 1),
ci = conf_int[nrow(conf_int), ])
}
for the GLM method and
get_outputs.gam <- function(x, level = .95) {
summ <- summary(x)
conf_int <- mgcv.helper::confint.gam(x, level = level)
c(n = nrow(x$model),
or = unname(tail(summ$p.coef, 1)),
p = unname(tail(summ$p.pv, 1)),
ci = unlist(conf_int[nrow(conf_int), 5:6]))
}
for the GAM method. Let’s try them:
viparc2 %>%
glm(cbind(suc, fai) ~ week + arm, binomial, .) %>%
get_outputs()
## n or.armafter p ci.2.5 % ci.97.5 %
## 844.0000000 -0.8387795 0.0000000 -0.8747153 -0.8030341
and
viparc2 %>%
mgcv::gam(cbind(suc, fai) ~ s(week) + arm, binomial, ., method = "REML") %>%
get_outputs()
## n or p ci.2.5% ci.97.5%
## 844.0000000 -0.8491334 0.0000000 -0.8856469 -0.8126198
Let’s make a batch:
(batch <- expand.grid(cs = list(c("malaise", "diarrhoea"),
c("respiratory", "cns")),
am = list(c("colistin_use", "oxytetracycline_use"),
c("apramycin_use", "cefadroxil_use"))))
## cs am
## 1 malaise, diarrhoea colistin_use, oxytetracycline_use
## 2 respiratory, cns colistin_use, oxytetracycline_use
## 3 malaise, diarrhoea apramycin_use, cefadroxil_use
## 4 respiratory, cns apramycin_use, cefadroxil_use
The corresponding data sets:
datasets <- purrr::pmap(batch, prepare_data, df = viparc)
The models:
glms <- t(sapply(datasets, function(x) get_outputs(glm(cbind(suc, fai) ~ week + arm, binomial, x))))
Or with a GAM:
gams <- t(sapply(datasets, function(x) get_outputs(mgcv::gam(cbind(suc, fai) ~ s(week) + arm, binomial, x, method = "REML"))))
Let’s write pipelines:
pipeline_glm <- function(df, cs = NULL, am = NULL, gap = 1, x = 1) {
require(magrittr)
prepare_data(df, cs, am, gap, x) %$%
glm(cbind(suc, fai) ~ week + arm, binomial) %>%
get_outputs()
}
and:
pipeline_gam <- function(df, cs = NULL, am = NULL, gap = 1, x = 1) {
require(magrittr)
prepare_data(df, cs, am, gap, x) %$%
mgcv::gam(cbind(suc, fai) ~ s(week) + arm, binomial, method = "REML") %>%
get_outputs()
}
Let’s try it:
pipeline_glm(viparc, cs_set, am_set)
## n or.armafter p ci.2.5 % ci.97.5 %
## 928.0000000 -0.8640434 0.0000000 -0.8984372 -0.8298357
and
pipeline_gam(viparc, cs_set, am_set)
## n or p ci.2.5% ci.97.5%
## 928.0000000 -0.8538708 0.0000000 -0.8886307 -0.8191110
a <- batch %>%
tibble::as_tibble() %>%
assign("plan", ., 1) %>% # saving intermediate value
c(FUN = pipeline_gam, ., MoreArgs = list(list(df = viparc)), SIMPLIFY = FALSE, mc.cores = 3) %>%
do.call(parallel::mcmapply, .) %>%
do.call(dplyr::bind_rows, .) %>%
dplyr::bind_cols(plan, .)
Here is a full pipeline around GAM:
run_plan <- function(df, cs = NULL, am = NULL, gap = 1, x = 1) {
require(magrittr)
var_names <- setdiff(names(df), c("arm", "farm", "flock", "week", "nb_chicken", "nb_chicken_sold", "chicken_disease_death", "chicken_sudden_death"))
if (is.null(cs)) cs <- list(grep("_use$", var_names, value = TRUE, invert = TRUE))
else if (! is.list(cs)) cs <- list(cs)
if (is.null(am)) am <- list(grep("_use$", var_names, value = TRUE))
else if (! is.list(am)) am <- list(am)
expand.grid(cs = cs, am = am, gap = gap, x = x) %>%
tibble::as_tibble() %>%
assign("plan", ., 1) %>% # saving intermediate value
c(FUN = purrr::safely(pipeline_gam), ., MoreArgs = list(list(df = df)), SIMPLIFY = FALSE, mc.cores = 4) %>%
do.call(parallel::mcmapply, .) %>%
lapply(function(x) x$result) %>%
assign("results", ., 1) %>%
sapply(is.null) %>%
`[<-`(results, ., list(c("n" = NA, "or" = NA, "p" = NA, "ci.2.5%" = NA, "ci.97.5%" = NA))) %>%
do.call(dplyr::bind_rows, .) %>%
dplyr::bind_cols(plan, .)
}
Let’s try it:
a <- run_plan(viparc, list(c("malaise", "diarrhoea"),
c("respiratory", "cns")),
list(c("colistin_use", "oxytetracycline_use"),
c("apramycin_use", "cefadroxil_use")))
The following function develops the list of clinical signs and / or antimicrobials so that they can easily be read in the data frame:
develop <- function(df, var = NULL) {
names_df <- names(df)
var_ref <- c("cs", "am")
if (is.null(var)) var <- intersect(names_df, var_ref)
stopifnot(var %in% names_df)
stopifnot(var %in% var_ref)
dplyr::mutate_at(df, var, ~ sapply(., paste, collapse = ", "))
}
The second one removes the columns that contain no variability:
simplify <- function(df) {
df[, sapply(df, function(x) length(unique(x))) > 1]
}
Let’s try them:
develop(a)
## # A tibble: 4 x 9
## cs am gap x n or p `ci.2.5%` `ci.97.5%`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 malaise, diarrhoea colistin_use, oxytetracycline_use 1 1 928 -0.854 0. -0.889 -0.819
## 2 respiratory, cns colistin_use, oxytetracycline_use 1 1 196 0.215 1.10e- 6 0.128 0.302
## 3 malaise, diarrhoea apramycin_use, cefadroxil_use 1 1 1030 -0.771 0. -0.804 -0.737
## 4 respiratory, cns apramycin_use, cefadroxil_use 1 1 229 0.318 5.28e-24 0.256 0.380
and
simplify(a)
## # A tibble: 4 x 7
## cs am n or p `ci.2.5%` `ci.97.5%`
## <list> <list> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 <chr [2]> <chr [2]> 928 -0.854 0. -0.889 -0.819
## 2 <chr [2]> <chr [2]> 196 0.215 1.10e- 6 0.128 0.302
## 3 <chr [2]> <chr [2]> 1030 -0.771 0. -0.804 -0.737
## 4 <chr [2]> <chr [2]> 229 0.318 5.28e-24 0.256 0.380
and
simplify(develop(a))
## # A tibble: 4 x 7
## cs am n or p `ci.2.5%` `ci.97.5%`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 malaise, diarrhoea colistin_use, oxytetracycline_use 928 -0.854 0. -0.889 -0.819
## 2 respiratory, cns colistin_use, oxytetracycline_use 196 0.215 1.10e- 6 0.128 0.302
## 3 malaise, diarrhoea apramycin_use, cefadroxil_use 1030 -0.771 0. -0.804 -0.737
## 4 respiratory, cns apramycin_use, cefadroxil_use 229 0.318 5.28e-24 0.256 0.380
Let’s download the data on the antimicrobials classes:
ab_classes <- readr::read_csv("https://raw.githubusercontent.com/viparc/clires_data/master/data/antimicrobial_classes.csv", col_types = "cc")
ab_classes$antimicrobial <- paste0(ab_classes$antimicrobial, "_use")
It looks like this:
ab_classes
## # A tibble: 44 x 2
## antimicrobial class
## <chr> <chr>
## 1 amoxicillin_use penecillins
## 2 ampicillin_use penecillins
## 3 cefadroxil_use cephalosporins
## 4 cefotaxime_use cephalosporins
## 5 cephalexin_use cephalosporins
## 6 ceftiofur_use cephalosporins
## 7 colistin_use polypeptides
## 8 enramycin_use polypeptides
## 9 enrofloxacin_use quinolones
## 10 flumequine_use quinolones
## # … with 34 more rows
Takes 15’:
tmp <- run_plan(df = viparc,
cs = as.list(setdiff(grep("_use$", names(viparc), value = TRUE, invert = TRUE),
c("farm", "flock", "week", "nb_chicken", "nb_chicken_sold",
"chicken_disease_death", "chicken_sudden_death"))),
am = split(ab_classes$antimicrobial, ab_classes$class),
gap = 0:2,
x = 0:2) %>%
dplyr::mutate(cs = unlist(cs),
am = names(am)) %>%
dplyr::rename(clinical_signs = cs,
class = am)