Sketch out of the analysis

As for the analysis of the prophylactic AMU, the main difficulty here consists in rearranging the data. Once the data is properly rearranged, then the analysis is pretty classical. In this section we sketch out the rationale of our analysis.

Because we are interested in therapeutic AMU, we will consider in this analysis only weeks that are from disease episodes. A first challenge will thus consist in characterizing disease episodes in terms of a set of clinical signs and also in terms of gap (see the functions fill_gaps() and split_by_disease_episode() below), as explained in Figure 1 below.

Figure 1: Defining a disease episode. The arrow represents the weeks (each rectangle is a week) of a flock, from the first week on the left ot the last one on the right. The red dots represent the presence of disease, as defined by a set of clinical signs (where here we consider there is disease if at least one of the clinical signs of the set is reported). In order to account for the fact that symptoms may not be always reported, we want to allow the possibility to convert one or a few weeks without reported symptoms and surrounded by weeks with reported symptoms into one single disease episode. The parameter gap is the number of week(s) without symptoms we allow ourself to skip when defining a disease episode. Below the arrow are 3 examples of disease episodes definitions with 3 episodes when gap = 0, 2 episodes when gap = 1 and 1 episode only when gap = 3.

Once this is done, the next step consists in splitting each disease episode into 2 periods: one before AMU (defined by a set of antimicrobials) and one after AMU (Figure 2). Same as for the symptoms: here we consider there is AMU if at least one of the drugs set that we consider is reported. The exact week where AMU starts is excluded from the analysis because for this particular week it’s impossible to infer any form of causality. For some disease episodes there might be no AMU at all. On the contrary, for some other disease episodes, there might be AMU from the beginning of the episode. In those cases, we will have only a before or an after period. Also, because we really want to ascertain that our AMU can be considered as therapeutic, will exclude from the analysis all the disease episodes that had AMU during the x weeks before the start of the episode. This is what is done by the function exclude_if_amu_before() defined below.

Figure 2: Separating the weeks before from the weeks after AMU in all disease episode. In this example, the arrows show the first 2 and the last flocks of our data. Each flock starts on the left end and ends on the right end of the arrow and the length of the arrow is the duration of the flock. Colored sections represent disease episodes as identified in Figure 1. The red rectangles represent the first week of AMU in the disease episodes. Sometimes there is no AMU at all during the disease episode (as on the third episode of the first flock) and some other times the firt week of AMU is the first week of the episode as on the second episode of the second flock or the first episode of the last flock. Once these first weeks of AMU are identified in all the disease episodes, we gather, from all the disease episodes of all the flocks, all the weeks that occur before (in blue) and all the weeks that occur after (in green) these first weeks of AMU, making 2 arms: “before” and “after”.

Once this is done, we end up with two arms of weeks: one before and one after AMU (Figure 2). We can then estimate and compare the mortality rates in these two arms with a logistic regression as detailed below. The great advantage of a logistic regression is that it allows to include covariables in order to control for potential confounding effects. For the purpose of the examples below, we will use the following sets of clinical signs to define disease episodes:

cs_set <- c("malaise", "diarrhoea")

with this value of gap:

gap <- 1

and the following set of antimicrobials to defined AMU:

am_set <- c("colistin_use", "oxytetracycline_use")

and the following value of x:

x <- 1

but the function prepare_data() at the end is a pipeline that puts everything together, with cs_set, am_set, gap and x as tuning parameters. The last section shows the different ways (basically with and without correcting for potential condounding effects) to performed the statistical analysis. The user can thus prepare a number of different data sets, depending on the values of the cs_set, am_set, gap and x arguments and, on each of these data sets, perform a number of analyses depending on whether (s)he wants to correct for potential confounding effects, and which ones.

Packages

Installing the required packages:

required <- c("dplyr", "magrittr", "purrr", "readr")
to_install <- setdiff(required, row.names(installed.packages()))
if (length(to_install)) install.packages(to_install)

Loading magrittr for interactive use:

library(magrittr)

Loading the data

viparc <- readr::read_csv("https://raw.githubusercontent.com/viparc/prophylactic/master/data/viparc_qualitative.csv",
                          col_types = paste(c("ciiiddd", rep("l", 51)), collapse = ""))

We have 4 types of variables in this data frame:

  • week ID with the variables farm, flock and week;
  • chicken demography data with the variables nb_chicken, nb_chicken_sold, chicken_disease_death and chicken_sudden_death;
  • presence of clinical signs with the variables respiratory, diarrhoea, cns, malaise, leg_lesions and sudden_death;
  • AMU (presence / absence too) with the 46 variables amoxicillin_use to unknown_use.

Preparing the data

The function fill_gaps() takes a vector of logicals (or zeros and ones) and replaces all the 1 0 1 sequences by 1 1 1 (when gap = 1), or all 1 0 0 1 sequences by 1 1 1 1 (when gap = 2), etc…:

fill_gaps_0 <- function(x, gap) {
  as.integer(x) %>%         # because easier to deal with 1-character encoding
    paste(collapse = "") %>% 
    gsub(paste(c(1, rep(0, gap), 1), collapse = ""),
         paste(rep(1, gap + 2), collapse = ""), .) %>% 
    gsub("NA", "x", .) %>%  # need to replace potential NA by 1-character...
    strsplit("") %>%        # ... because of the strsplit() call that follows
    unlist(FALSE) %>%       # because strsplit() returns a list
    sub("x", NA, .) %>%     # back to NA
    as.numeric() %>% 
    as.logical()
}

Let’s try it:

fill_gaps_0(c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE), 2)
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE

Works fine but it doesn’t replace the gaps that are shorter than the gap argument:

fill_gaps_0(c(TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE), 2)
##  [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

fill_gaps() is a wrapper around fill_gaps_0() that fixes this problem:

fill_gaps <- function(x, gap) {
  while(gap > 0) {
    x <- fill_gaps_0(x, gap)
    gap <- gap - 1
  }
  x
}

Let’s try it:

fill_gaps(c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE), 2)
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE

and:

fill_gaps(c(TRUE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE), 2)
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

The function episodes() identifies the different disease episodes. It would for example transform 1 1 1 0 0 1 1 0 0 0 1 0 into 1 1 1 0 0 2 2 0 0 0 3 0:

episodes <- function(x) {
  x <- rle(x)
  sel <- which(x$values > 0)
  x$values[sel] <- seq_along(sel)
  inverse.rle(x)
}

Let’s try it:

episodes(c(1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0))
##  [1] 1 1 1 0 0 2 2 0 0 0 3 0

The function split_by_disease_episode() uses the function episodes() to generate, for a given flock, disease episodes according to a set of clinical signs cs, and splits by row the data frame into a list of data frames, each corresponding to a disease episode:

split_by_disease_episode <- function(df, cs, gap) {
  if (any(unique(diff(df$week))) != 1) return(NULL) # makes sure there is no missing week
  df %>% 
    dplyr::mutate(disease = dplyr::select(., !! cs) %>%
                              rowSums() %>%
                              as.logical() %>%
                              fill_gaps(gap) %>% 
                              as.integer() %>%
                              episodes()) %>% 
    dplyr::filter(disease > 0) %>% 
    split(., .$disease) %>% 
    lapply(dplyr::select, -disease)
}

Let’s try it:

viparc %>% 
  dplyr::filter(farm == "75-004", flock == 1) %>% 
  dplyr::arrange(week) %>% 
  split_by_disease_episode(cs_set, gap) %>% 
  lapply(dplyr::select, farm, flock, week, !! cs_set) # for simpler printing
## $`1`
## # A tibble: 1 x 5
##   farm   flock  week malaise diarrhoea
##   <chr>  <int> <int> <lgl>   <lgl>    
## 1 75-004     1     3 TRUE    FALSE    
## 
## $`2`
## # A tibble: 2 x 5
##   farm   flock  week malaise diarrhoea
##   <chr>  <int> <int> <lgl>   <lgl>    
## 1 75-004     1     7 TRUE    TRUE     
## 2 75-004     1     8 TRUE    TRUE

Works for one given cycle. Here is how we make it works for all cycles:

a <- viparc %>% 
  dplyr::arrange(week) %>% 
  dplyr::group_by(farm, flock) %>% 
  dplyr::group_map(~ split_by_disease_episode(.x, cs_set, gap), keep = TRUE) %>% 
  unlist(FALSE)

which gives:

a %>% 
  lapply(dplyr::select, farm, flock, week, !! cs_set) %>%
  head()
## $`1`
## # A tibble: 3 x 5
##   farm   flock  week malaise diarrhoea
##   <chr>  <int> <int> <lgl>   <lgl>    
## 1 75-001     6     1 TRUE    FALSE    
## 2 75-001     6     2 FALSE   FALSE    
## 3 75-001     6     3 TRUE    FALSE    
## 
## $`1`
## # A tibble: 3 x 5
##   farm   flock  week malaise diarrhoea
##   <chr>  <int> <int> <lgl>   <lgl>    
## 1 75-002     1     1 TRUE    FALSE    
## 2 75-002     1     2 FALSE   FALSE    
## 3 75-002     1     3 TRUE    FALSE    
## 
## $`2`
## # A tibble: 1 x 5
##   farm   flock  week malaise diarrhoea
##   <chr>  <int> <int> <lgl>   <lgl>    
## 1 75-002     1     5 TRUE    FALSE    
## 
## $`1`
## # A tibble: 1 x 5
##   farm   flock  week malaise diarrhoea
##   <chr>  <int> <int> <lgl>   <lgl>    
## 1 75-002     2     4 TRUE    FALSE    
## 
## $`2`
## # A tibble: 1 x 5
##   farm   flock  week malaise diarrhoea
##   <chr>  <int> <int> <lgl>   <lgl>    
## 1 75-002     2    12 TRUE    FALSE    
## 
## $`1`
## # A tibble: 1 x 5
##   farm   flock  week malaise diarrhoea
##   <chr>  <int> <int> <lgl>   <lgl>    
## 1 75-002     3     1 TRUE    FALSE

Before we continue, we want to exclude from this list all the disease episodes with AMU during the \(x\) weeks before the start of the episode, AMU being, again, defined by a set of antimicrobials.

exclude_if_amu_before <- function(df, x, df_ref, ab) {
  require(magrittr)
  am <- ab
  first_week <- min(df$week)
  if (first_week == 1) return(df)
  weeks <- (first_week - x):(first_week - 1)
  weeks <- weeks[weeks > 0]
  amu_before_episode <- df_ref %>% 
    dplyr::filter(farm == df$farm[1], flock == df$flock[1], week %in% weeks) %>% 
    dplyr::select({{ am }}) %>% 
    rowSums() %>% 
    as.logical() %>% 
    any()
  if (amu_before_episode) return(NULL)
  df
}

where df is a sub data frame of the viparc data frame, x is the duration, in weeks, of the period before the start of the disease episode we consider to filter out according to AMU, df_ref is typically the viparc data frame, and ab is a set of antimicrobials used to define AMU. Let’s try it:

exclude_if_amu_before(a[[1]], 2, viparc, am_set)
## # A tibble: 3 x 58
##   farm  flock  week nb_chicken nb_chicken_sold chicken_disease… chicken_sudden_… respiratory diarrhoea cns   malaise leg_lesions
##   <chr> <int> <int>      <int>           <dbl>            <dbl>            <dbl> <lgl>       <lgl>     <lgl> <lgl>   <lgl>      
## 1 75-0…     6     1        195               9                9                0 FALSE       FALSE     FALSE TRUE    FALSE      
## 2 75-0…     6     2        195               0                0                0 FALSE       FALSE     FALSE FALSE   FALSE      
## 3 75-0…     6     3        190               5                5                0 FALSE       FALSE     FALSE TRUE    FALSE      
## # … with 46 more variables: sudden_death <lgl>, amoxicillin_use <lgl>, ampicillin_use <lgl>, apramycin_use <lgl>, cefadroxil_use <lgl>,
## #   cefotaxime_use <lgl>, ceftiofur_use <lgl>, cephalexin_use <lgl>, chloramphenicol_use <lgl>, ciprofloxacin_use <lgl>,
## #   colistin_use <lgl>, doxycycline_use <lgl>, enramycin_use <lgl>, enrofloxacin_use <lgl>, erythromycin_use <lgl>,
## #   florfenicol_use <lgl>, flumequine_use <lgl>, gentamicin_use <lgl>, josamycin_use <lgl>, kitasamycin_use <lgl>, lincomycin_use <lgl>,
## #   marbofloxacin_use <lgl>, methenamine_use <lgl>, neomycin_use <lgl>, norfloxacin_use <lgl>, oxytetracycline_use <lgl>,
## #   spectinomycin_use <lgl>, spiramycin_use <lgl>, streptomycin_use <lgl>, sulfachloropyridazine_use <lgl>, sulfadiazine_use <lgl>,
## #   sulfadimethoxine_use <lgl>, sulfadimidine_use <lgl>, sulfaguanidin_use <lgl>, sulfamethazine_use <lgl>, sulfamethoxazole_use <lgl>,
## #   sulfamethoxypyridazine_use <lgl>, sulphamethoxazole_use <lgl>, sulphathiazole_use <lgl>, tetracycline_use <lgl>,
## #   thiamphenicol_use <lgl>, tiamulin_use <lgl>, tilmicosin_use <lgl>, trimethoprim_use <lgl>, tylosin_use <lgl>, unknown_use <lgl>

The function before_after() below splits the data frame of a disease episode into 2 data frames, one before the use of AMU and one after the use of AMU. The week where AMU starts is deleted. AMU is defined in terms of a set of antimicrobials.

before_after <- function(df, ab) {
  require(magrittr)
  am <- ab
  start <- df %>%
    dplyr::select({{ am }}) %>%
    rowSums() %>%
    as.logical() %>%
    which() %>%
    head(1)
  if (length(start) < 1) return(list(df, NULL)) # there is never AMU
  if (start < 2) return(list(NULL, df[-1, ])) # AMU starts on the first week
  nr <- nrow(df)
  if (start == nr) return(list(df[-nr, ], NULL)) # AMU starts on the last week
  list(df[1:(start - 1), ], df[(start + 1):nr, ])
}

Let’s try it:

b <- lapply(a, before_after, am_set)

A pipeline that assembles the above defined split_by_disease_episode(), exclude_if_amu_before() and before_after() functions in order to prepare the data for analysis:

prepare_data <- function(df, cs = NULL, am = NULL, gap = 1, x = 1) {
  require(magrittr)
  var_names <- setdiff(names(df), c("arm", "farm", "flock", "week", "nb_chicken",
                                    "nb_chicken_sold", "chicken_disease_death",
                                    "chicekn_sudden_death"))
  if (is.null(cs)) cs <- grep("_use$", var_names, value = TRUE, invert = TRUE)
  if (is.null(am)) am <- grep("_use$", var_names, value = TRUE)
  df %>% 
    dplyr::arrange(week) %>% 
    dplyr::group_by(farm, flock) %>% 
    dplyr::group_map(~ split_by_disease_episode(.x, cs, gap), keep = TRUE) %>% 
    unlist(FALSE) %>%
    lapply(exclude_if_amu_before, x, viparc, am) %>% 
    purrr::compact() %>% # because exclude_if_amu_before() potentially generates NULL elements
    lapply(before_after, am_set) %>% 
    purrr::transpose() %>%
    lapply(purrr::compact) %>% # because before_after() potentially generates NULL elements
    setNames(c("before", "after")) %>% 
    lapply(dplyr::bind_rows) %>% 
    dplyr::bind_rows(.id = "arm") %>% 
    dplyr::mutate(arm = factor(arm, c("before", "after")),
                  suc = chicken_disease_death + chicken_sudden_death,
                  fai = nb_chicken + nb_chicken_sold)
}

Let’s try it (takes a few seconds to run):

viparc2 <- prepare_data(viparc,
                        c("malaise", "diarrhoea"),
                        c("colistin_use", "oxytetracycline_use"), 1, 2)

In this particular case, it produces the following numbers of weeks in the two arms:

table(viparc2$arm)
## 
## before  after 
##    491    353

Comparing mortality rates

Once the data is properly put in shape, the actual statistical analysis is quite classical. Here, we consider a logistic regression:

(mod <- glm(cbind(suc, fai) ~ arm, binomial, viparc2))
## 
## Call:  glm(formula = cbind(suc, fai) ~ arm, family = binomial, data = viparc2)
## 
## Coefficients:
## (Intercept)     armafter  
##     -2.5677      -0.9455  
## 
## Degrees of Freedom: 843 Total (i.e. Null);  842 Residual
## Null Deviance:       38000 
## Residual Deviance: 34870     AIC: 37870

Note that, in this model, if you wish to correct for potential counfounding effects such as chicken age or the use of other antimicrobials or the presence of other clinical signs or AMU during the first stages of the flock, you will have to include the corresponding variables in the formula of the model, making sure that the arm variable is the last one. You can explore the model summary:

summary(mod)
## 
## Call:
## glm(formula = cbind(suc, fai) ~ arm, family = binomial, data = viparc2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -15.213   -4.800   -2.531    0.519   38.756  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.567715   0.008977  -286.0   <2e-16 ***
## armafter    -0.945539   0.018010   -52.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 37999  on 843  degrees of freedom
## Residual deviance: 34871  on 842  degrees of freedom
## AIC: 37875
## 
## Number of Fisher Scoring iterations: 5

After AMU, the odds of dying is multiplied by the following factor:

unname(exp(tail(coef(mod), 1)))
## [1] 0.3884701

with the following confidence interval:

mod %>% 
  confint() %>% 
  `[`(., nrow(.), ) %>% 
  exp()
##     2.5 %    97.5 % 
## 0.3749583 0.4023886

The significativity of the therapeutic AMU is:

anova(mod, test = "LRT")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: cbind(suc, fai)
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                   843      37999              
## arm   1   3128.7       842      34871 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimating the mortality rate (in absence of covariables)

In absence of any covariable (age of the flock, etc…), you can also compute the death rates for the two arms (before and after) straightforwardly thanks to the following function:

death_rate <- function(m) {
  require(magrittr)
  m %>% 
    predict.glm(se.fit = TRUE) %>% 
    lapply(unique) %>%
    head(2) %>% 
    purrr::transpose() %>% 
    lapply(function(x) c(x$fit, x$fit + c(-1, 1) * 1.96 * x$se.fit)) %>% 
    lapply(m$family$linkinv) %>% 
    setNames(c("before", "after")) %>% 
    lapply(function(x) setNames(as.data.frame(t(x)), c("estimate", "lower", "upper"))) %>% 
    dplyr::bind_rows(.id = "arm")
}

Let’s try it:

death_rate(mod)
##      arm   estimate      lower      upper
## 1 before 0.07124534 0.07008984 0.07241841
## 2  after 0.02893745 0.02808981 0.02980989

These “rates” are the individual probability of death per week. There may be the possibility to compute similar things accounting for covariates but I need to do some research on that. We can however verify that:

0.02989321 / (1 - 0.02989321) / (0.07742014 / (1 - 0.07742014))
## [1] 0.3672003

And here the probability ratio is

0.02989321 / 0.07742014
## [1] 0.3861167

Adding covariables

The only covariable that seems to make sense to include is the age of the chicken:

(mod <- glm(cbind(suc, fai) ~ week + arm, binomial, viparc2))
## 
## Call:  glm(formula = cbind(suc, fai) ~ week + arm, family = binomial, 
##     data = viparc2)
## 
## Coefficients:
## (Intercept)         week     armafter  
##    -3.09268      0.06406     -0.83878  
## 
## Degrees of Freedom: 843 Total (i.e. Null);  841 Residual
## Null Deviance:       38000 
## Residual Deviance: 33390     AIC: 36400
anova(mod, test = "LRT")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: cbind(suc, fai)
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                   843      37999              
## week  1   2271.4       842      35728 < 2.2e-16 ***
## arm   1   2338.0       841      33390 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Alternatively, this can be modeled with a GAM:

(mod <- mgcv::gam(cbind(suc, fai) ~ s(week) + arm, binomial, viparc2, method = "REML"))
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## cbind(suc, fai) ~ s(week) + arm
## 
## Estimated degrees of freedom:
## 8.9  total = 10.9 
## 
## REML score: 17747.96
summary(mod)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## cbind(suc, fai) ~ s(week) + arm
## 
## Parametric coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.639218   0.009666 -273.05   <2e-16 ***
## armafter    -0.849133   0.018603  -45.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df Chi.sq p-value    
## s(week) 8.898  8.995   2373  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0882   Deviance explained = 14.7%
## -REML =  17748  Scale est. = 1         n = 844

A pipeline

As done for the prophylactic study, let’s define the generic and methods get_outputs():

get_outputs <- function (x, ...) {
  UseMethod("get_outputs", x)
}

for the generic and

get_outputs.glm <- function(x) {
  conf_int <- confint(x)
  c(n    = nrow(x$model),
    or   = tail(coef(x), 1),
    p    = tail(anova(x, test = "LRT")$`Pr(>Chi)`, 1),
    ci   = conf_int[nrow(conf_int), ])
}

for the GLM method and

get_outputs.gam <- function(x, level = .95) {
  summ <- summary(x)
  conf_int <- mgcv.helper::confint.gam(x, level = level)
  c(n    = nrow(x$model),
    or   = unname(tail(summ$p.coef, 1)),
    p    = unname(tail(summ$p.pv, 1)),
    ci   = unlist(conf_int[nrow(conf_int), 5:6]))
}

for the GAM method. Let’s try them:

viparc2 %>% 
  glm(cbind(suc, fai) ~ week + arm, binomial, .) %>% 
  get_outputs()
##           n or.armafter           p    ci.2.5 %   ci.97.5 % 
## 844.0000000  -0.8387795   0.0000000  -0.8747153  -0.8030341

and

viparc2 %>% 
  mgcv::gam(cbind(suc, fai) ~ s(week) + arm, binomial, ., method = "REML") %>% 
  get_outputs()
##           n          or           p     ci.2.5%    ci.97.5% 
## 844.0000000  -0.8491334   0.0000000  -0.8856469  -0.8126198

Let’s make a batch:

(batch <- expand.grid(cs = list(c("malaise", "diarrhoea"),
                                c("respiratory", "cns")),
                      am = list(c("colistin_use", "oxytetracycline_use"),
                                c("apramycin_use", "cefadroxil_use"))))
##                   cs                                am
## 1 malaise, diarrhoea colistin_use, oxytetracycline_use
## 2   respiratory, cns colistin_use, oxytetracycline_use
## 3 malaise, diarrhoea     apramycin_use, cefadroxil_use
## 4   respiratory, cns     apramycin_use, cefadroxil_use

The corresponding data sets:

datasets <- purrr::pmap(batch, prepare_data, df = viparc)

The models:

glms <- t(sapply(datasets, function(x) get_outputs(glm(cbind(suc, fai) ~ week + arm, binomial, x))))

Or with a GAM:

gams <- t(sapply(datasets, function(x) get_outputs(mgcv::gam(cbind(suc, fai) ~ s(week) + arm, binomial, x, method = "REML"))))

Let’s write pipelines:

pipeline_glm <- function(df, cs = NULL, am = NULL, gap = 1, x = 1) {
  require(magrittr)
  prepare_data(df, cs, am, gap, x) %$% 
    glm(cbind(suc, fai) ~ week + arm, binomial) %>% 
    get_outputs()
}

and:

pipeline_gam <- function(df, cs = NULL, am = NULL, gap = 1, x = 1) {
  require(magrittr)
  prepare_data(df, cs, am, gap, x) %$% 
    mgcv::gam(cbind(suc, fai) ~ s(week) + arm, binomial, method = "REML") %>% 
    get_outputs()
}

Let’s try it:

pipeline_glm(viparc, cs_set, am_set)
##           n or.armafter           p    ci.2.5 %   ci.97.5 % 
## 928.0000000  -0.8640434   0.0000000  -0.8984372  -0.8298357

and

pipeline_gam(viparc, cs_set, am_set)
##           n          or           p     ci.2.5%    ci.97.5% 
## 928.0000000  -0.8538708   0.0000000  -0.8886307  -0.8191110
a <- batch %>% 
  tibble::as_tibble() %>%
  assign("plan", ., 1) %>% # saving intermediate value
  c(FUN = pipeline_gam, ., MoreArgs = list(list(df = viparc)), SIMPLIFY = FALSE, mc.cores = 3) %>%
  do.call(parallel::mcmapply, .) %>% 
  do.call(dplyr::bind_rows, .) %>% 
  dplyr::bind_cols(plan, .)

Here is a full pipeline around GAM:

run_plan <- function(df, cs = NULL, am = NULL, gap = 1, x = 1) {
  require(magrittr)
  
  var_names <- setdiff(names(df), c("arm", "farm", "flock", "week", "nb_chicken", "nb_chicken_sold", "chicken_disease_death", "chicken_sudden_death"))
  if (is.null(cs)) cs <- list(grep("_use$", var_names, value = TRUE, invert = TRUE))
  else if (! is.list(cs)) cs <- list(cs)
  if (is.null(am)) am <- list(grep("_use$", var_names, value = TRUE))
  else if (! is.list(am)) am <- list(am)
  
  expand.grid(cs = cs, am = am, gap = gap, x = x) %>%
    tibble::as_tibble() %>%
    assign("plan", ., 1) %>% # saving intermediate value
    c(FUN = purrr::safely(pipeline_gam), ., MoreArgs = list(list(df = df)), SIMPLIFY = FALSE, mc.cores = 4) %>%
    do.call(parallel::mcmapply, .) %>% 
    lapply(function(x) x$result) %>%
    assign("results", ., 1) %>% 
    sapply(is.null) %>% 
    `[<-`(results, ., list(c("n" = NA, "or" = NA, "p" = NA, "ci.2.5%" = NA, "ci.97.5%" = NA))) %>% 
    do.call(dplyr::bind_rows, .) %>% 
    dplyr::bind_cols(plan, .)
}

Let’s try it:

a <- run_plan(viparc, list(c("malaise", "diarrhoea"),
                           c("respiratory", "cns")),
                      list(c("colistin_use", "oxytetracycline_use"),
                           c("apramycin_use", "cefadroxil_use")))

The following function develops the list of clinical signs and / or antimicrobials so that they can easily be read in the data frame:

develop <- function(df, var = NULL) {
  names_df <- names(df)
  var_ref <- c("cs", "am")
  if (is.null(var)) var <- intersect(names_df, var_ref)
  stopifnot(var %in% names_df)
  stopifnot(var %in% var_ref)
  dplyr::mutate_at(df, var, ~ sapply(., paste, collapse = ", "))
}

The second one removes the columns that contain no variability:

simplify <- function(df) {
  df[, sapply(df, function(x) length(unique(x))) > 1]
}

Let’s try them:

develop(a)
## # A tibble: 4 x 9
##   cs                 am                                  gap     x     n     or        p `ci.2.5%` `ci.97.5%`
##   <chr>              <chr>                             <dbl> <dbl> <dbl>  <dbl>    <dbl>     <dbl>      <dbl>
## 1 malaise, diarrhoea colistin_use, oxytetracycline_use     1     1   928 -0.854 0.          -0.889     -0.819
## 2 respiratory, cns   colistin_use, oxytetracycline_use     1     1   196  0.215 1.10e- 6     0.128      0.302
## 3 malaise, diarrhoea apramycin_use, cefadroxil_use         1     1  1030 -0.771 0.          -0.804     -0.737
## 4 respiratory, cns   apramycin_use, cefadroxil_use         1     1   229  0.318 5.28e-24     0.256      0.380

and

simplify(a)
## # A tibble: 4 x 7
##   cs        am            n     or        p `ci.2.5%` `ci.97.5%`
##   <list>    <list>    <dbl>  <dbl>    <dbl>     <dbl>      <dbl>
## 1 <chr [2]> <chr [2]>   928 -0.854 0.          -0.889     -0.819
## 2 <chr [2]> <chr [2]>   196  0.215 1.10e- 6     0.128      0.302
## 3 <chr [2]> <chr [2]>  1030 -0.771 0.          -0.804     -0.737
## 4 <chr [2]> <chr [2]>   229  0.318 5.28e-24     0.256      0.380

and

simplify(develop(a))
## # A tibble: 4 x 7
##   cs                 am                                    n     or        p `ci.2.5%` `ci.97.5%`
##   <chr>              <chr>                             <dbl>  <dbl>    <dbl>     <dbl>      <dbl>
## 1 malaise, diarrhoea colistin_use, oxytetracycline_use   928 -0.854 0.          -0.889     -0.819
## 2 respiratory, cns   colistin_use, oxytetracycline_use   196  0.215 1.10e- 6     0.128      0.302
## 3 malaise, diarrhoea apramycin_use, cefadroxil_use      1030 -0.771 0.          -0.804     -0.737
## 4 respiratory, cns   apramycin_use, cefadroxil_use       229  0.318 5.28e-24     0.256      0.380

Generating tables

Let’s download the data on the antimicrobials classes:

ab_classes <- readr::read_csv("https://raw.githubusercontent.com/viparc/clires_data/master/data/antimicrobial_classes.csv", col_types = "cc")
ab_classes$antimicrobial <- paste0(ab_classes$antimicrobial, "_use")

It looks like this:

ab_classes
## # A tibble: 44 x 2
##    antimicrobial    class         
##    <chr>            <chr>         
##  1 amoxicillin_use  penecillins   
##  2 ampicillin_use   penecillins   
##  3 cefadroxil_use   cephalosporins
##  4 cefotaxime_use   cephalosporins
##  5 cephalexin_use   cephalosporins
##  6 ceftiofur_use    cephalosporins
##  7 colistin_use     polypeptides  
##  8 enramycin_use    polypeptides  
##  9 enrofloxacin_use quinolones    
## 10 flumequine_use   quinolones    
## # … with 34 more rows

Takes 15’:

tmp <- run_plan(df = viparc,
                cs = as.list(setdiff(grep("_use$", names(viparc), value = TRUE, invert = TRUE),
                                     c("farm", "flock", "week", "nb_chicken", "nb_chicken_sold",
                                       "chicken_disease_death", "chicken_sudden_death"))),
                am = split(ab_classes$antimicrobial, ab_classes$class),
                gap = 0:2,
                x = 0:2) %>% 
  dplyr::mutate(cs = unlist(cs),
                am = names(am)) %>% 
  dplyr::rename(clinical_signs = cs,
                class = am)