If you want to run this code locally, look at the last two sections

Introduction

Here we aim at testing whether the prophylactic use of antibiotics (exposure) has an effect on the occurence of clinical signs (outcome). The main challenge in this analysis is the preparation of the data in such a way that AMU exposure can be considered prophylactic AMU exposure. The overall strategy for data preparation is described in Figure 1.

Figure 1: Preparation of data before analysis. The horizontal arrow represents the time line of a flock, divided into weeks that are represented by rectangles, starting on week 1 (on the left). For any given week w (represented here by the hashed rectangle), we compute (i) an exposure variable based on the use or not of antibiotics on week w (step 1, in green) and (ii) an outcome variable based on the occurence or not of clinical signs over an observation period of x weeks after week w (step 2, in red). Statistical analyses will consist in investigating whether AMU on week w (exposure) affects the occurence of clinical signs over the observation period (outcome). In order to make sure that AMU exposure on week w does correspond to AMU that we can consider as prophylactic, we need to filter out from the analysis all the weeks w preceded by (i) the presence of clinical signs over the y weeks before week w (including week w) and (ii) AMU over the z weeks before week w (naturally excluding week w since this information is used to compute the exposure variable). This step 0 is shown in blue on the figure. Finally, the analysis may include potential confounding factors (shown in orange) such as the age of the chicken (i.e. week w) as well as AMU during the first a weeks of the flock’s life (in grey) and during the x weeks of the observation period.

Note that the duration y over which the filtering based on clinical signs is done and the duration z over which the filtering based on AMU is done do not have to be identical, although it probably makes sense if they are. On Figure 1, CS corresponds to a given set of clinical signs and AMU corresponds to a given set of antimicrobials. Note that the sets of clinical signs used for the filtering process and the computation of the outcome variable do not have to be identical, although, again, it probably makes sense if they are. Same remark for the sets of antimicrobials used for the filtering process, and the computations of the exposure variable and the confounding variables over the first weeks of the flock and the observation period. In the code below, we create the following 4 functions allowing us to prepare the data according to the strategy outlined in Figure 1:

  • make_exposure(df, antimicrobials) computes a boolean exposure variable using the set of antimicrobials antimicrobials of the data frame df.

  • make_outcome(df, outcome, loc) computes a boolean outcome variable for the loc weeks that follow week w, using the set outcome of clinical signs of the data frame df. Note that this function will also be used to compute the AMU covariable during the observation period, in which case outcome will then contain the set of antimicrobials to consider.

  • prophylactic(df, clinical_signs, antimicrobials, lcs, lam) computes a boolean variable expressing whether any given week of the data df can be included in the analysis or not. This is based on the filtering process explained in figure 1 and considering the clinical_signs set of clinical signs over the lcs weeks before week w (including week w) and the antimicrobials set of antimicrobials over lam weeks before week w (excluding week w).

  • first_weeks(df, antimicrobials, wk): computes the boolean AMU covariable during the first wk weeks of the flock’ life, using the antimicrobials set of antimicrobials.

In these 4 functions, df is a data frame of the structure of viparc data, as described here. These 4 functions are designed so that they assemble perfectly well into pipelines. Once the data are prepared, the analysis can be performed by different statistical tests / models:

  • the simplest one is certainely a Fisher exact test. However, it appears that both prophylactic AMU and the occurence of clinical signs strongly depends on the age of the flock (as well as, to a lesser extend, on the AMU during the first weeks of the flock and the observation period). For that reason, we considered more sophisticatted analyses allowing to correct for confounding effects. These are

  • logistic regression. Here, we compared correction by partial likelihood ratio tests and stratification. It produces similar results. Because the effect of age on the occurence of clinical signs may be non-linear we also considered

  • general additive models in order to model the age - outcome relationship in an as flexible way as possible. After the preliminary analyses shown below, it is this option that we selected for full analysis.

We then developped a number of pipelines combining both data preparation and data analysis. pipeline1() is based on the Fisher exact test, pipeline3() is based on the GLM and pipeline5() is based on the GAM. pipeline2(), pipeline4() and pipeline6() are the same as pipeline1(), pipeline3() and pipeline5() respectively, with simplified interfaces. At the end, we created 3 functions that allows to explore the exposure - outcome relationships using various choices for the definitions of the duration of the first week, the filtering periods and the observation period, as well as the definition of the sets of antimicrobials and clinical signs to consider:

  • run_plan(pipeline, df, ...) runs the pipeline pipeline on the data frame df using the sets of parameters as defined by ....

  • develop() and simplify() allow to manipulate the output of run_plan().

Packages

Installing the required packages:

installed_packages <- row.names(installed.packages())
required <- c("dplyr", "magrittr", "mgcv", "purrr", "readr")
to_install <- setdiff(required, installed_packages)
if (length(to_install)) install.packages(to_install)
if (! require("mgcv.helper")) {
  if (! require("remotes")) install.packages("remotes")
  remotes::install_github("samclifford/mgcv.helper")
}

Loading magrittr for interactive use:

library(magrittr)

Loading the data

viparc <- readr::read_csv("https://raw.githubusercontent.com/viparc/prophylactic/master/data/viparc_qualitative.csv",
                          col_types = paste(c("ciiiddd", rep("l", 51)), collapse = "")) %>% 
  dplyr::select(-nb_chicken, -chicken_disease_death, -chicken_sudden_death, -nb_chicken_sold)

We have 3 types of variables in this data frame:

  • week ID with the variables farm, flock and week;
  • presence of clinical signs with the variables respiratory, diarrhoea, cns, malaise, leg_lesions and sudden_death;
  • AMU (presence / absence too) with the 45 variables amoxicillin_use to unknown_use.

For the purpose of the examples used below, we will used the following sets of antimicrobials and clinical signs:

am_set <- c("colistin_use", "oxytetracycline_use")
cs_set <- c("malaise", "diarrhoea")

Filtering out weeks

The week we are interested in are the weeks for which

  • there is no clinical signs during the weeks \(w, \dots, w - x\);
  • there is no AMU during the weeks \(w - 1, \dots, w - y\).

Here, we build the function prophylactic() that helps performing this filtering. This function makes use of the function lagging() defined below and that lags the variables var of the data frame df. Note that this later must have at least 3 columns named respectively farm, flock and week.

lagging <- function(df, var, lag) {
  require(magrittr)                
  stopifnot(c("farm", "flock", "week", var) %in% names(df))
  df %>%
    dplyr::select(!! c("farm", "flock", "week", var)) %>% 
    dplyr::mutate(week = week + lag) %>% 
    dplyr::rename_at(var, paste0, lag)
}

Let’s try it:

lapply(0:2, lagging, df = viparc, var = am_set) %>% 
  purrr::reduce(dplyr::full_join, c("farm", "flock", "week")) %>% 
  dplyr::left_join(viparc, ., c("farm", "flock", "week")) %>% 
  dplyr::select(farm, flock, week, dplyr::matches("colistin|oxytetracycline"))
## # A tibble: 5,566 x 11
##    farm  flock  week colistin_use oxytetracycline… colistin_use0 oxytetracycline… colistin_use1 oxytetracycline… colistin_use2
##    <chr> <int> <int> <lgl>        <lgl>            <lgl>         <lgl>            <lgl>         <lgl>            <lgl>        
##  1 75-0…     1     1 TRUE         TRUE             TRUE          TRUE             NA            NA               NA           
##  2 75-0…     1     2 FALSE        FALSE            FALSE         FALSE            TRUE          TRUE             NA           
##  3 75-0…     1     3 FALSE        FALSE            FALSE         FALSE            FALSE         FALSE            TRUE         
##  4 75-0…     1     4 FALSE        FALSE            FALSE         FALSE            FALSE         FALSE            FALSE        
##  5 75-0…     1     5 FALSE        FALSE            FALSE         FALSE            FALSE         FALSE            FALSE        
##  6 75-0…     1     6 FALSE        FALSE            FALSE         FALSE            FALSE         FALSE            FALSE        
##  7 75-0…     1     7 FALSE        FALSE            FALSE         FALSE            FALSE         FALSE            FALSE        
##  8 75-0…     1     8 FALSE        FALSE            FALSE         FALSE            FALSE         FALSE            FALSE        
##  9 75-0…     1     9 FALSE        TRUE             FALSE         TRUE             FALSE         FALSE            FALSE        
## 10 75-0…     1    10 FALSE        FALSE            FALSE         FALSE            FALSE         TRUE             FALSE        
## # … with 5,556 more rows, and 1 more variable: oxytetracycline_use2 <lgl>

It works as expected (note that we could have used the dplyr::lag() function too). Now we can define the prophylactic() function that uses the above-defined lagging() function:

prophylactic <- function(df, clinical_signs, antimicrobials, lcs, lam) {
  require(magrittr)
  week_id <- c("farm", "flock", "week")
  purrr::map2(list(c(0, seq_len(lcs)), seq_len(lam)), # doing the lagging on clinical signs and AMU
              list(clinical_signs, antimicrobials),
              ~ purrr::map(.x, lagging, df = df, var = .y)) %>%
    unlist(FALSE) %>% 
    purrr::reduce(dplyr::left_join, week_id) %>% # putting together into a single data frames
# note: we don't need to join with the original data because the original data
# is already here thanks to the lag of 0 in the above lagging() call to the clinical signs.    
    dplyr::select(-week_id) %>% 
    dplyr::mutate_all(dplyr::coalesce, FALSE) %>% # we want to get rid off FALSEs and also NAs
    rowSums() %>%
    not()
}

Let’s try it:

viparc %>% 
  dplyr::mutate(prphlctc = prophylactic(., cs_set, am_set, 2, 2)) %>% 
  dplyr::select(farm, flock, week, am_set, cs_set, prphlctc) %>%
  head(20)
## # A tibble: 20 x 8
##    farm   flock  week colistin_use oxytetracycline_use malaise diarrhoea prphlctc
##    <chr>  <int> <int> <lgl>        <lgl>               <lgl>   <lgl>     <lgl>   
##  1 75-001     1     1 TRUE         TRUE                FALSE   FALSE     TRUE    
##  2 75-001     1     2 FALSE        FALSE               FALSE   FALSE     FALSE   
##  3 75-001     1     3 FALSE        FALSE               FALSE   FALSE     FALSE   
##  4 75-001     1     4 FALSE        FALSE               FALSE   FALSE     TRUE    
##  5 75-001     1     5 FALSE        FALSE               FALSE   FALSE     TRUE    
##  6 75-001     1     6 FALSE        FALSE               FALSE   FALSE     TRUE    
##  7 75-001     1     7 FALSE        FALSE               FALSE   FALSE     TRUE    
##  8 75-001     1     8 FALSE        FALSE               FALSE   FALSE     TRUE    
##  9 75-001     1     9 FALSE        TRUE                FALSE   FALSE     TRUE    
## 10 75-001     1    10 FALSE        FALSE               FALSE   FALSE     FALSE   
## 11 75-001     1    11 TRUE         FALSE               FALSE   FALSE     FALSE   
## 12 75-001     1    12 FALSE        FALSE               FALSE   FALSE     FALSE   
## 13 75-001     1    13 FALSE        FALSE               FALSE   FALSE     FALSE   
## 14 75-001     1    14 FALSE        FALSE               FALSE   FALSE     TRUE    
## 15 75-001     1    15 FALSE        FALSE               FALSE   FALSE     TRUE    
## 16 75-001     1    16 TRUE         FALSE               FALSE   FALSE     TRUE    
## 17 75-001     1    17 FALSE        FALSE               FALSE   FALSE     FALSE   
## 18 75-001     1    18 FALSE        FALSE               FALSE   FALSE     FALSE   
## 19 75-001     1    19 FALSE        FALSE               FALSE   FALSE     TRUE    
## 20 75-001     1    20 FALSE        FALSE               FALSE   FALSE     TRUE

The filtering can be performed in one step:

viparc %>% 
  dplyr::filter(prophylactic(., cs_set, am_set, 2, 2))
## # A tibble: 2,942 x 54
##    farm  flock  week respiratory diarrhoea cns   malaise leg_lesions sudden_death amoxicillin_use ampicillin_use apramycin_use
##    <chr> <int> <int> <lgl>       <lgl>     <lgl> <lgl>   <lgl>       <lgl>        <lgl>           <lgl>          <lgl>        
##  1 75-0…     1     1 FALSE       FALSE     FALSE FALSE   FALSE       TRUE         FALSE           FALSE          FALSE        
##  2 75-0…     1     4 FALSE       FALSE     FALSE FALSE   FALSE       FALSE        FALSE           FALSE          FALSE        
##  3 75-0…     1     5 FALSE       FALSE     FALSE FALSE   FALSE       FALSE        FALSE           FALSE          FALSE        
##  4 75-0…     1     6 FALSE       FALSE     FALSE FALSE   FALSE       TRUE         FALSE           FALSE          FALSE        
##  5 75-0…     1     7 FALSE       FALSE     FALSE FALSE   FALSE       FALSE        FALSE           FALSE          FALSE        
##  6 75-0…     1     8 FALSE       FALSE     FALSE FALSE   FALSE       TRUE         FALSE           FALSE          FALSE        
##  7 75-0…     1     9 FALSE       FALSE     FALSE FALSE   FALSE       FALSE        FALSE           FALSE          FALSE        
##  8 75-0…     1    14 FALSE       FALSE     FALSE FALSE   FALSE       FALSE        FALSE           FALSE          FALSE        
##  9 75-0…     1    15 FALSE       FALSE     FALSE FALSE   FALSE       FALSE        FALSE           FALSE          FALSE        
## 10 75-0…     1    16 FALSE       FALSE     FALSE FALSE   FALSE       FALSE        FALSE           FALSE          FALSE        
## # … with 2,932 more rows, and 42 more variables: cefadroxil_use <lgl>, cefotaxime_use <lgl>, ceftiofur_use <lgl>, cephalexin_use <lgl>,
## #   chloramphenicol_use <lgl>, ciprofloxacin_use <lgl>, colistin_use <lgl>, doxycycline_use <lgl>, enramycin_use <lgl>,
## #   enrofloxacin_use <lgl>, erythromycin_use <lgl>, florfenicol_use <lgl>, flumequine_use <lgl>, gentamicin_use <lgl>,
## #   josamycin_use <lgl>, kitasamycin_use <lgl>, lincomycin_use <lgl>, marbofloxacin_use <lgl>, methenamine_use <lgl>,
## #   neomycin_use <lgl>, norfloxacin_use <lgl>, oxytetracycline_use <lgl>, spectinomycin_use <lgl>, spiramycin_use <lgl>,
## #   streptomycin_use <lgl>, sulfachloropyridazine_use <lgl>, sulfadiazine_use <lgl>, sulfadimethoxine_use <lgl>,
## #   sulfadimidine_use <lgl>, sulfaguanidin_use <lgl>, sulfamethazine_use <lgl>, sulfamethoxazole_use <lgl>,
## #   sulfamethoxypyridazine_use <lgl>, sulphamethoxazole_use <lgl>, sulphathiazole_use <lgl>, tetracycline_use <lgl>,
## #   thiamphenicol_use <lgl>, tiamulin_use <lgl>, tilmicosin_use <lgl>, trimethoprim_use <lgl>, tylosin_use <lgl>, unknown_use <lgl>

Seems to work as expected.

Computing clinical signs outcomes

The make_outcome() function is another use of the lagging() function:

make_outcome <- function(df, outcome, loc) {
  require(magrittr)
  week_id <- c("farm", "flock", "week")
  lapply(-seq_len(loc), lagging, df = df, var = outcome) %>% # this is a list of loc slots
    purrr::reduce(dplyr::left_join, week_id) %>% # puts together into a single data frame
    dplyr::left_join(df, ., week_id) %>% # merges with original data frame
    dplyr::select(dplyr::matches("-\\d+$")) %>% 
    rowSums() %>% 
    as.logical()
}

where outcome is a vector of clinical signs names used to compute the outcome, loc is the duration, in weeks, of the observation period. Let’s try it:

viparc[356:386, ] %>% 
  dplyr::mutate(disease = make_outcome(., cs_set, 2)) %>% 
  dplyr::select(farm, flock, week, cs_set, disease)
## # A tibble: 31 x 6
##    farm   flock  week malaise diarrhoea disease
##    <chr>  <int> <int> <lgl>   <lgl>     <lgl>  
##  1 75-004     7    13 FALSE   FALSE     FALSE  
##  2 75-004     7    14 FALSE   FALSE     FALSE  
##  3 75-004     7    15 FALSE   FALSE     FALSE  
##  4 75-004     7    16 FALSE   FALSE     NA     
##  5 75-004     7    17 FALSE   FALSE     NA     
##  6 75-005     1     1 TRUE    TRUE      TRUE   
##  7 75-005     1     2 TRUE    TRUE      TRUE   
##  8 75-005     1     3 TRUE    TRUE      FALSE  
##  9 75-005     1     4 FALSE   FALSE     FALSE  
## 10 75-005     1     5 FALSE   FALSE     FALSE  
## # … with 21 more rows

Seems to work as expected. Note the missing values at the end of the flocks.

Computing AMU exposure

The make_exposure() function is very simple:

make_exposure <- function(df, antimicrobials) {
  require(magrittr)
  df %>%
    dplyr::select(antimicrobials) %>%
    rowSums() %>%
    as.logical()
}

Where antimicrobials is a vector of antimicrobials names that should be considered to compute the exposure. Let’s try it:

viparc %>% 
  dplyr::mutate(amu = make_exposure(., am_set)) %>% 
  dplyr::select(farm, flock, week, am_set, amu)
## # A tibble: 5,566 x 6
##    farm   flock  week colistin_use oxytetracycline_use amu  
##    <chr>  <int> <int> <lgl>        <lgl>               <lgl>
##  1 75-001     1     1 TRUE         TRUE                TRUE 
##  2 75-001     1     2 FALSE        FALSE               FALSE
##  3 75-001     1     3 FALSE        FALSE               FALSE
##  4 75-001     1     4 FALSE        FALSE               FALSE
##  5 75-001     1     5 FALSE        FALSE               FALSE
##  6 75-001     1     6 FALSE        FALSE               FALSE
##  7 75-001     1     7 FALSE        FALSE               FALSE
##  8 75-001     1     8 FALSE        FALSE               FALSE
##  9 75-001     1     9 FALSE        TRUE                TRUE 
## 10 75-001     1    10 FALSE        FALSE               FALSE
## # … with 5,556 more rows

Seems to work as expected.

Optional: covariables

The final statistical analysis can be performed correcting for potential confounders. Potential confounders that we may want to consider are the AMU during the first few weeks of the cycle, AMU during the observation period, and the age of the flock. For the latter one, the variable is simply the week number. The two other variables are respectively computed by the first_weeks() and make_outcome() functions as detailed below.

AMU during the first week(s)

The following function computes a boolean variable that specifies whether any of the antimicrobials defined in the antimicrobials vector has been used during the first wk weeks of the flock’s life:

first_weeks <- function(df, antimicrobials, wk = 1) {
  df %>%
    dplyr::filter(week < wk + 1) %>%
    dplyr::group_by(farm, flock) %>% 
    dplyr::group_modify(~ tibble::tibble(out = any(unlist(.x[, antimicrobials])))) %>% 
    dplyr::ungroup() %>% 
    dplyr::left_join(df, ., c("farm", "flock")) %>% 
    dplyr::select(out) %>% 
    unlist()
}

Let’s try it:

viparc %>% 
  dplyr::mutate(amu_first_weeks = first_weeks(., am_set)) %>%
  dplyr::select(farm, flock, week, am_set, amu_first_weeks)
## # A tibble: 5,566 x 6
##    farm   flock  week colistin_use oxytetracycline_use amu_first_weeks
##    <chr>  <int> <int> <lgl>        <lgl>               <lgl>          
##  1 75-001     1     1 TRUE         TRUE                TRUE           
##  2 75-001     1     2 FALSE        FALSE               TRUE           
##  3 75-001     1     3 FALSE        FALSE               TRUE           
##  4 75-001     1     4 FALSE        FALSE               TRUE           
##  5 75-001     1     5 FALSE        FALSE               TRUE           
##  6 75-001     1     6 FALSE        FALSE               TRUE           
##  7 75-001     1     7 FALSE        FALSE               TRUE           
##  8 75-001     1     8 FALSE        FALSE               TRUE           
##  9 75-001     1     9 FALSE        TRUE                TRUE           
## 10 75-001     1    10 FALSE        FALSE               TRUE           
## # … with 5,556 more rows

And with the first 2 weeks:

viparc %>% 
  dplyr::mutate(amu_first_weeks = first_weeks(., am_set, 2)) %>% 
  dplyr::select(farm, flock, week, am_set, amu_first_weeks)
## # A tibble: 5,566 x 6
##    farm   flock  week colistin_use oxytetracycline_use amu_first_weeks
##    <chr>  <int> <int> <lgl>        <lgl>               <lgl>          
##  1 75-001     1     1 TRUE         TRUE                TRUE           
##  2 75-001     1     2 FALSE        FALSE               TRUE           
##  3 75-001     1     3 FALSE        FALSE               TRUE           
##  4 75-001     1     4 FALSE        FALSE               TRUE           
##  5 75-001     1     5 FALSE        FALSE               TRUE           
##  6 75-001     1     6 FALSE        FALSE               TRUE           
##  7 75-001     1     7 FALSE        FALSE               TRUE           
##  8 75-001     1     8 FALSE        FALSE               TRUE           
##  9 75-001     1     9 FALSE        TRUE                TRUE           
## 10 75-001     1    10 FALSE        FALSE               TRUE           
## # … with 5,556 more rows

Looking at another cycle from another farm:

viparc %>% 
  dplyr::mutate(amu_first_weeks = first_weeks(., am_set)) %>% 
  dplyr::select(farm, flock, week, am_set, amu_first_weeks) %>% 
  dplyr::filter(farm == "75-013")
## # A tibble: 54 x 6
##    farm   flock  week colistin_use oxytetracycline_use amu_first_weeks
##    <chr>  <int> <int> <lgl>        <lgl>               <lgl>          
##  1 75-013     1     1 TRUE         FALSE               TRUE           
##  2 75-013     1     2 TRUE         FALSE               TRUE           
##  3 75-013     1     3 FALSE        FALSE               TRUE           
##  4 75-013     1     4 FALSE        FALSE               TRUE           
##  5 75-013     1     5 FALSE        FALSE               TRUE           
##  6 75-013     1     6 FALSE        FALSE               TRUE           
##  7 75-013     1     7 FALSE        FALSE               TRUE           
##  8 75-013     1     8 FALSE        FALSE               TRUE           
##  9 75-013     1     9 FALSE        FALSE               TRUE           
## 10 75-013     1    10 TRUE         FALSE               TRUE           
## # … with 44 more rows

Seems to work as expected.

AMU over the watching period

If one wants to compute a covariable accounting for AMU during the watching period, this can simply be done by reusing the make_outcome() function defined above:

viparc %>%
  dplyr::mutate(amu_observation = make_outcome(., am_set, 2)) %>% 
  dplyr::select(farm, flock, week, am_set, amu_observation) %>%
  head(20)
## # A tibble: 20 x 6
##    farm   flock  week colistin_use oxytetracycline_use amu_observation
##    <chr>  <int> <int> <lgl>        <lgl>               <lgl>          
##  1 75-001     1     1 TRUE         TRUE                FALSE          
##  2 75-001     1     2 FALSE        FALSE               FALSE          
##  3 75-001     1     3 FALSE        FALSE               FALSE          
##  4 75-001     1     4 FALSE        FALSE               FALSE          
##  5 75-001     1     5 FALSE        FALSE               FALSE          
##  6 75-001     1     6 FALSE        FALSE               FALSE          
##  7 75-001     1     7 FALSE        FALSE               TRUE           
##  8 75-001     1     8 FALSE        FALSE               TRUE           
##  9 75-001     1     9 FALSE        TRUE                TRUE           
## 10 75-001     1    10 FALSE        FALSE               TRUE           
## 11 75-001     1    11 TRUE         FALSE               FALSE          
## 12 75-001     1    12 FALSE        FALSE               FALSE          
## 13 75-001     1    13 FALSE        FALSE               FALSE          
## 14 75-001     1    14 FALSE        FALSE               TRUE           
## 15 75-001     1    15 FALSE        FALSE               TRUE           
## 16 75-001     1    16 TRUE         FALSE               FALSE          
## 17 75-001     1    17 FALSE        FALSE               FALSE          
## 18 75-001     1    18 FALSE        FALSE               FALSE          
## 19 75-001     1    19 FALSE        FALSE               FALSE          
## 20 75-001     1    20 FALSE        FALSE               FALSE

Examples of uses

Here we show how to assemble the functions prophylactic(), make_exposure(), make_outcome() and first_weeks() into a pipeline to test a hypothesis.

WARNING 1: in these pipelines, the filtering of the weeks should always be the last step of the data preparation, just before the statistical analysis.

WARNING 2: na.exclude() calls should always be done after the selection of variables.

For the sake of the examples, we will consider the following data:

(data1 <- viparc %>% 
  dplyr::mutate(amu = make_exposure(., am_set),
                disease = make_outcome(., cs_set, 2)) %>% 
  dplyr::filter(prophylactic(., cs_set, am_set, 2, 2)) %>%
  dplyr::select(amu, disease) %>% 
  na.exclude())
## # A tibble: 2,412 x 2
##    amu   disease
##    <lgl> <lgl>  
##  1 TRUE  FALSE  
##  2 FALSE FALSE  
##  3 FALSE FALSE  
##  4 FALSE FALSE  
##  5 FALSE FALSE  
##  6 FALSE FALSE  
##  7 TRUE  FALSE  
##  8 FALSE FALSE  
##  9 FALSE FALSE  
## 10 TRUE  FALSE  
## # … with 2,402 more rows

Fisher exact test

Simple use of a Fisher exact test, without any co-variable:

data1 %$%
  table(amu, disease) %T>% print() %>% 
  fisher.test()
##        disease
## amu     FALSE TRUE
##   FALSE  1933  276
##   TRUE    149   54
## 
##  Fisher's Exact Test for Count Data
## 
## data:  .
## p-value = 3.037e-07
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.777194 3.582548
## sample estimates:
## odds ratio 
##   2.536786

Logistic regression

A logistic regression can equivalently be used:

data1 %>% 
  glm(disease ~ amu, binomial, .) %>% 
  summary()
## 
## Call:
## glm(formula = disease ~ amu, family = binomial, data = .)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7865  -0.5167  -0.5167  -0.5167   2.0396  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.94643    0.06435 -30.249  < 2e-16 ***
## amuTRUE      0.93147    0.17138   5.435 5.47e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1925.5  on 2411  degrees of freedom
## Residual deviance: 1899.3  on 2410  degrees of freedom
## AIC: 1903.3
## 
## Number of Fisher Scoring iterations: 4

where the estimate of the coefficient (amuTRUE) is the log odds ratio.

Correcting for potential confounders

In this section we address the problem of confounders and explore on an example different options to control for them.

The presence of potential counfounders

Before we start this section, let’s define another utilitary function that adds a vector of suffixes suff to a character vector x where we want all possible combinations of x and suff:

add_suffix <- function(x, suff) {
  require(magrittr)
  expand.grid(x, suff) %>%
    purrr::reduce(purrr::map2, paste0) %>%
    unlist()
}

Let’s try it:

add_suffix(letters[1:5], 1:3)
##  [1] "a1" "b1" "c1" "d1" "e1" "a2" "b2" "c2" "d2" "e2" "a3" "b3" "c3" "d3" "e3"

Let’s now consider the following example:

(data2 <- viparc %>% 
  dplyr::mutate(amu = make_exposure(., am_set),
                disease = make_outcome(., cs_set, 2),
                amu_first_weeks = first_weeks(., am_set),
                amu_observation = make_outcome(., am_set, 2)) %>% 
  dplyr::filter(prophylactic(., cs_set, am_set, 2, 2)) %>% 
  dplyr::select(amu, disease, week, amu_first_weeks, amu_observation))
## # A tibble: 2,942 x 5
##    amu   disease  week amu_first_weeks amu_observation
##    <lgl> <lgl>   <int> <lgl>           <lgl>          
##  1 TRUE  FALSE       1 TRUE            FALSE          
##  2 FALSE FALSE       4 TRUE            FALSE          
##  3 FALSE FALSE       5 TRUE            FALSE          
##  4 FALSE FALSE       6 TRUE            FALSE          
##  5 FALSE FALSE       7 TRUE            TRUE           
##  6 FALSE FALSE       8 TRUE            TRUE           
##  7 TRUE  FALSE       9 TRUE            TRUE           
##  8 FALSE FALSE      14 TRUE            TRUE           
##  9 FALSE FALSE      15 TRUE            TRUE           
## 10 TRUE  FALSE      16 TRUE            FALSE          
## # … with 2,932 more rows

As we can see below, both exposure (amu) and outcome (disease) depend on week, amu_first_weeks and amu_observation:

purrr::map(add_suffix(c("amu", "disease"), paste(" ~", c("week", "amu_first_weeks", "amu_observation"))),
           ~ .x %>% 
             formula() %>% 
             glm(binomial, na.exclude(data2)) %>% 
             summary() %>% 
             coef() %>% 
             `[`(2, 4) %>% 
             data.frame(model = .x, p = ., stringsAsFactors = FALSE)) %>% 
  dplyr::bind_rows()
##                       model            p
## 1                amu ~ week 3.103919e-57
## 2            disease ~ week 6.290267e-42
## 3     amu ~ amu_first_weeks 6.144753e-12
## 4 disease ~ amu_first_weeks 3.284952e-05
## 5     amu ~ amu_observation 9.720589e-32
## 6 disease ~ amu_observation 3.672945e-15

These three variables week, amu_first_weeks and amu_observation are thus potential confounders that need to be corrected for.

Correcting for potential confounders

If one wants to consider (and correct for) covariables – such as age, AMU on the first week or AMU during the observation period –, the pipeline would look as below:

data2 %>% 
  na.exclude() %T>%
  {dplyr::group_by(., amu, disease, amu_first_weeks, amu_observation) %>%
     dplyr::tally() %>%
     dplyr::ungroup() %>%
     print()} %>%  
  glm(disease ~ amu_first_weeks + amu_observation + week + amu, binomial, .) %T>%
  {summary(.) %>% print()} %>% 
  anova(test = "LRT")
## # A tibble: 16 x 5
##    amu   disease amu_first_weeks amu_observation     n
##    <lgl> <lgl>   <lgl>           <lgl>           <int>
##  1 FALSE FALSE   FALSE           FALSE             630
##  2 FALSE FALSE   FALSE           TRUE               23
##  3 FALSE FALSE   TRUE            FALSE            1219
##  4 FALSE FALSE   TRUE            TRUE               61
##  5 FALSE TRUE    FALSE           FALSE             117
##  6 FALSE TRUE    FALSE           TRUE               20
##  7 FALSE TRUE    TRUE            FALSE             110
##  8 FALSE TRUE    TRUE            TRUE               29
##  9 TRUE  FALSE   FALSE           FALSE               3
## 10 TRUE  FALSE   FALSE           TRUE                9
## 11 TRUE  FALSE   TRUE            FALSE              96
## 12 TRUE  FALSE   TRUE            TRUE               41
## 13 TRUE  TRUE    FALSE           FALSE               6
## 14 TRUE  TRUE    FALSE           TRUE                1
## 15 TRUE  TRUE    TRUE            FALSE              31
## 16 TRUE  TRUE    TRUE            TRUE               16
## 
## Call:
## glm(formula = disease ~ amu_first_weeks + amu_observation + week + 
##     amu, family = binomial, data = .)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3183  -0.5354  -0.3893  -0.2808   2.8530  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -0.19480    0.15014  -1.297 0.194486    
## amu_first_weeksTRUE -0.33233    0.13632  -2.438 0.014775 *  
## amu_observationTRUE  0.68800    0.18153   3.790 0.000151 ***
## week                -0.16788    0.01515 -11.078  < 2e-16 ***
## amuTRUE             -0.28454    0.21318  -1.335 0.181959    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1925.5  on 2411  degrees of freedom
## Residual deviance: 1693.1  on 2407  degrees of freedom
## AIC: 1703.1
## 
## Number of Fisher Scoring iterations: 5
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: disease
## 
## Terms added sequentially (first to last)
## 
## 
##                 Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                             2411     1925.5              
## amu_first_weeks  1   16.900      2410     1908.6 3.939e-05 ***
## amu_observation  1   58.215      2409     1850.3 2.350e-14 ***
## week             1  155.396      2408     1695.0 < 2.2e-16 ***
## amu              1    1.812      2407     1693.1    0.1783    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

NOTE: If one wanted to correct for potential farm and / or flock effect, he’d have to consider mixed effect models (not done here). (Not even sure we need to do this here.)

Note that accounting for potential confounders makes the effect of the AMU exposure not significant anymore. Furthermore the estimated odds ratio is now below 1. Let’s try to drop the week variable:

data2 %>% 
  dplyr::select(-week) %>% 
  na.exclude() %>%  
  glm(disease ~ amu_first_weeks + amu_observation + amu, binomial, .) %T>% {summary(.) %>% print()} %>% 
  anova(test = "LRT")
## 
## Call:
## glm(formula = disease ~ amu_first_weeks + amu_observation + amu, 
##     family = binomial, data = .)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2952  -0.5907  -0.4334  -0.4334   2.1963  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -1.65752    0.09511 -17.427  < 2e-16 ***
## amu_first_weeksTRUE -0.66054    0.12668  -5.214 1.85e-07 ***
## amu_observationTRUE  1.14139    0.17587   6.490 8.59e-11 ***
## amuTRUE              0.78888    0.19154   4.119 3.81e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1925.5  on 2411  degrees of freedom
## Residual deviance: 1834.7  on 2408  degrees of freedom
## AIC: 1842.7
## 
## Number of Fisher Scoring iterations: 4
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: disease
## 
## Terms added sequentially (first to last)
## 
## 
##                 Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                             2411     1925.5              
## amu_first_weeks  1   16.900      2410     1908.6 3.939e-05 ***
## amu_observation  1   58.215      2409     1850.3 2.350e-14 ***
## amu              1   15.675      2408     1834.7 7.521e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

which suggests that the week variable (i.e. the age of the flock) is really the variable that causes the spurious relationship between prophylactic use (amu) and occurence of infection (disease). Let’s try again with week but this time also including interaction with amu:

data2 %>% 
  na.exclude() %>%  
  glm(disease ~ amu_first_weeks + amu_observation + week * amu, binomial, .) %>% 
  anova(test = "LRT")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: disease
## 
## Terms added sequentially (first to last)
## 
## 
##                 Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                             2411     1925.5              
## amu_first_weeks  1   16.900      2410     1908.6 3.939e-05 ***
## amu_observation  1   58.215      2409     1850.3 2.350e-14 ***
## week             1  155.396      2408     1695.0 < 2.2e-16 ***
## amu              1    1.812      2407     1693.1    0.1783    
## week:amu         1    1.182      2406     1692.0    0.2769    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It doesn’t really help. Let’s try modeling the age effect in a more flexible way than a linear relationship, using GAM:

summary(mgcv::gam(disease ~ amu_first_weeks + amu_observation + s(week) + amu,
                  binomial, na.exclude(data2)))
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## disease ~ amu_first_weeks + amu_observation + s(week) + amu
## 
## Parametric coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -1.95978    0.11121 -17.622  < 2e-16 ***
## amu_first_weeksTRUE -0.41279    0.13906  -2.968 0.002994 ** 
## amu_observationTRUE  0.68377    0.18023   3.794 0.000148 ***
## amuTRUE              0.07168    0.23712   0.302 0.762441    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df Chi.sq p-value    
## s(week) 4.012  4.929    134  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.107   Deviance explained = 12.9%
## UBRE = -0.29821  Scale est. = 1         n = 2412

Not better. Let’s try to stratify. For example, under the age of 6 weeks there seems to have not much effect of week, amu_first_weeks and amu_observation. Look at the significativity of the confounders for the 1-5-week age strata:

data2 %>%
  dplyr::filter(week < 6) %>% 
  na.exclude() %>%  
  glm(disease ~ amu_first_weeks + amu_observation + week + amu, binomial, .) %>% 
  anova(test = "LRT")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: disease
## 
## Terms added sequentially (first to last)
## 
## 
##                 Df Deviance Resid. Df Resid. Dev Pr(>Chi)  
## NULL                              471     594.68           
## amu_first_weeks  1   5.2854       470     589.40   0.0215 *
## amu_observation  1   2.3992       469     587.00   0.1214  
## week             1   0.1597       468     586.84   0.6894  
## amu              1   1.7539       467     585.08   0.1854  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and for the 1-6-week age strata:

data2 %>% 
  dplyr::filter(week < 7) %>% 
  na.exclude() %>%  
  glm(disease ~ amu_first_weeks + amu_observation + week + amu, binomial, .) %>% 
  anova(test = "LRT")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: disease
## 
## Terms added sequentially (first to last)
## 
## 
##                 Df Deviance Resid. Df Resid. Dev Pr(>Chi)   
## NULL                              576     713.04            
## amu_first_weeks  1   8.5731       575     704.47 0.003412 **
## amu_observation  1   2.7580       574     701.71 0.096770 . 
## week             1   0.2318       573     701.48 0.630219   
## amu              1   1.5159       572     699.97 0.218239   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s thus consider the 6-weeks threshold and investigate the relationship between disease and amu:

data2 %>% 
  dplyr::filter(week < 6) %>% 
  dplyr::select(amu, disease) %>% 
  na.exclude() %>%  
  glm(disease ~ amu, binomial, .) %>% 
  anova(test = "LRT")
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: disease
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL                   471     594.68         
## amu   1 0.075703       470     594.61   0.7832

No significant relationship between prophylaxy and disease occurence in the 1-to-6-week age band.

Pipelines

In this section we show how to build pipelines that combine data preparation and data analysis.

Using the Fisher exact test

Let’s redefine a Fisher exact test function that returns the results of the test together with the input contingency table:

fisher_test <- function(x, ...) {
  list(x, fisher.test(x, ...))
}

Let’s define a first pipeline using this new function:

pipeline1 <- function(df, clinical_signs_filtering = NULL, antimicrobials_filtering = NULL, lcs,
                      lam, antimicrobials_input = NULL, clinical_signs_output = NULL, watching, ...) {
  require(magrittr)
  var_names <- setdiff(names(df), c("farm", "flock", "week"))
  cs_names <- grep("_use$", var_names, value = TRUE, invert = TRUE)
  am_names <- grep("_use$", var_names, value = TRUE)
  if (is.null(clinical_signs_filtering)) clinical_signs_filtering <- cs_names
  if (is.null(clinical_signs_output)) clinical_signs_output <- cs_names
  if (is.null(antimicrobials_filtering)) antimicrobials_filtering <- am_names
  if (is.null(antimicrobials_input)) antimicrobials_input <- am_names
  df %>% 
    dplyr::mutate(amu = make_exposure(., antimicrobials_input), # creates amu variable
                  disease = make_outcome(., clinical_signs_output, watching)) %>% # creates disease variable
    dplyr::filter(prophylactic(., clinical_signs_filtering, antimicrobials_filtering, lcs, lam)) %>% 
    dplyr::select(amu, disease) %>% 
    na.exclude() %$%
    table(amu, disease) %>% 
    fisher_test(., ...)
}

Let’s try it:

pipeline1(viparc,
# 1. FILTERING -----------------------------------------------------------------
          clinical_signs_filtering = cs_set,
          antimicrobials_filtering = am_set,
          lcs = 2, # the duration over which we perform the filtering on clinical signs
          lam = 2, # the duration over which we perform the filtering on antimicrobials use
# 2. INPUT ---------------------------------------------------------------------
          antimicrobials_input = am_set,
# 3. OUTPUT --------------------------------------------------------------------
          clinical_signs_output = cs_set,
          watching = 2) # the duration of the observation period
## [[1]]
##        disease
## amu     FALSE TRUE
##   FALSE  1933  276
##   TRUE    149   54
## 
## [[2]]
## 
##  Fisher's Exact Test for Count Data
## 
## data:  x
## p-value = 3.037e-07
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.777194 3.582548
## sample estimates:
## odds ratio 
##   2.536786

A simplified version of pipeline1() could be:

pipeline2 <- function(df, clinical_signs = NULL, antimicrobials = NULL, filtering, watching, ...)
  pipeline1(df, clinical_signs, antimicrobials, filtering, filtering, antimicrobials, clinical_signs, watching, ...)

And a call of this function would look like:

a <- pipeline2(viparc,
# 1. DEFINING THE CLINICAL SIGNS WE CONSIDER -----------------------------------
          clinical_signs = cs_set,
# 2. DEFINING THE ANTIMICROBIALS USAGES WE CONSIDER ----------------------------
          antimicrobials = am_set,
# 3. THE DURATION OF THE PERIOD BEFORE THE FOCAL WEEK OVER WHICH WE DO THE FILTERING
          filtering = 2,
# 4. THE DURATION OF THE PERIOD AFTER THE FOCAL WEEK OVER WHICH WE OBSERVE -----
          watching = 2)

Let’s explore the structure of the produced object:

str(a)
## List of 2
##  $ : 'table' int [1:2, 1:2] 1933 149 276 54
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ amu    : chr [1:2] "FALSE" "TRUE"
##   .. ..$ disease: chr [1:2] "FALSE" "TRUE"
##  $ :List of 7
##   ..$ p.value    : num 3.04e-07
##   ..$ conf.int   : num [1:2] 1.78 3.58
##   .. ..- attr(*, "conf.level")= num 0.95
##   ..$ estimate   : Named num 2.54
##   .. ..- attr(*, "names")= chr "odds ratio"
##   ..$ null.value : Named num 1
##   .. ..- attr(*, "names")= chr "odds ratio"
##   ..$ alternative: chr "two.sided"
##   ..$ method     : chr "Fisher's Exact Test for Count Data"
##   ..$ data.name  : chr "x"
##   ..- attr(*, "class")= chr "htest"

Thanks to pipeline2(), it’s very easy to explore different options for the analysis. For example, let’s compare this

pipeline2(viparc,
# 1. DEFINING THE CLINICAL SIGNS WE CONSIDER -----------------------------------
          clinical_signs = cs_set,
# 2. DEFINING THE ANTIMICROBIALS USAGES WE CONSIDER ----------------------------
          antimicrobials = am_set,
# 3. THE DURATION OF THE PERIOD BEFORE THE FOCAL WEEK OVER WHICH WE DO THE FILTERING
          filtering = 2,
# 4. THE DURATION OF THE PERIOD AFTER THE FOCAL WEEK OVER WHICH WE OBSERVE -----
          watching = 2)
## [[1]]
##        disease
## amu     FALSE TRUE
##   FALSE  1933  276
##   TRUE    149   54
## 
## [[2]]
## 
##  Fisher's Exact Test for Count Data
## 
## data:  x
## p-value = 3.037e-07
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.777194 3.582548
## sample estimates:
## odds ratio 
##   2.536786

with this:

pipeline2(viparc,
# 1. DEFINING THE CLINICAL SIGNS WE CONSIDER -----------------------------------
          clinical_signs = c(cs_set, "respiratory"),
# 2. DEFINING THE ANTIMICROBIALS USAGES WE CONSIDER ----------------------------
          antimicrobials = am_set,
# 3. THE DURATION OF THE PERIOD BEFORE THE FOCAL WEEK OVER WHICH WE DO THE FILTERING
          filtering = 2,
# 4. THE DURATION OF THE PERIOD AFTER THE FOCAL WEEK OVER WHICH WE OBSERVE -----
          watching = 2)
## [[1]]
##        disease
## amu     FALSE TRUE
##   FALSE  1830  296
##   TRUE    143   54
## 
## [[2]]
## 
##  Fisher's Exact Test for Count Data
## 
## data:  x
## p-value = 3.191e-06
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.634042 3.296929
## sample estimates:
## odds ratio 
##   2.333546

If you want all the clinical signs and all the antimicrobials:

pipeline2(viparc, filtering = 2, watching = 2)
## [[1]]
##        disease
## amu     FALSE TRUE
##   FALSE  1342  244
##   TRUE    142   62
## 
## [[2]]
## 
##  Fisher's Exact Test for Count Data
## 
## data:  x
## p-value = 5.391e-07
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.698328 3.365424
## sample estimates:
## odds ratio 
##   2.399929

Logistic regression with covariables

We can use similar rationale in order to define pipeline functions that perform logistic regressions with covariables:

pipeline3 <- function(df, clinical_signs_filtering = NULL, antimicrobials_filtering = NULL, lcs,
                      lam, antimicrobials_input = NULL, clinical_signs_output = NULL, watching,
                      antimicrobials_week1 = NULL, wk, antimicrobials_watching = NULL) {
  require(magrittr)
  var_names <- setdiff(names(df), c("farm", "flock", "week"))
  cs_names <- grep("_use$", var_names, value = TRUE, invert = TRUE)
  am_names <- grep("_use$", var_names, value = TRUE)
  if (is.null(clinical_signs_filtering)) clinical_signs_filtering <- cs_names
  if (is.null(clinical_signs_output)) clinical_signs_output <- cs_names
  if (is.null(antimicrobials_filtering)) antimicrobials_filtering <- am_names
  if (is.null(antimicrobials_input)) antimicrobials_input <- am_names
  if (is.null(antimicrobials_week1)) antimicrobials_week1 <- am_names
  if (is.null(antimicrobials_watching)) antimicrobials_watching <- am_names
  df %>% 
    dplyr::mutate(amu = make_exposure(., antimicrobials_input),
                  disease = make_outcome(., clinical_signs_output, watching),
                  amu_first_weeks = first_weeks(., antimicrobials_week1, wk),
                  amu_observation = make_outcome(., antimicrobials_watching, watching)) %>% 
    dplyr::filter(prophylactic(., clinical_signs_filtering, antimicrobials_filtering, lcs, lam)) %>% 
    dplyr::select(farm, flock, week, amu_first_weeks, amu_observation, amu, disease) %>% 
    na.exclude() %>% 
    glm(disease ~ amu_first_weeks + amu_observation + week + amu, binomial, .)
}

A wrapper with reduced flexibility in the interface:

pipeline4 <- function(df, clinical_signs = NULL, antimicrobials = NULL, filtering, watching, first_wks)
  pipeline3(df, clinical_signs, antimicrobials, filtering, filtering, antimicrobials, clinical_signs, watching, antimicrobials, first_wks, antimicrobials)

Let’s try it:

pipeline4(viparc, filtering = 2, watching = 2, first_wks = 1) %T>% 
  {summary(.) %>% print()} %>% 
  anova(test = "LRT")
## 
## Call:
## glm(formula = disease ~ amu_first_weeks + amu_observation + week + 
##     amu, family = binomial, data = .)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5733  -0.5509  -0.4180  -0.3150   2.6803  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -0.24475    0.17737  -1.380   0.1676    
## amu_first_weeksTRUE -0.24716    0.15040  -1.643   0.1003    
## amu_observationTRUE  1.28602    0.16194   7.941    2e-15 ***
## week                -0.14630    0.01541  -9.492   <2e-16 ***
## amuTRUE             -0.41588    0.21103  -1.971   0.0488 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1637.5  on 1789  degrees of freedom
## Residual deviance: 1396.3  on 1785  degrees of freedom
## AIC: 1406.3
## 
## Number of Fisher Scoring iterations: 5
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: disease
## 
## Terms added sequentially (first to last)
## 
## 
##                 Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                             1789     1637.5              
## amu_first_weeks  1    8.602      1788     1628.8  0.003357 ** 
## amu_observation  1  129.122      1787     1499.7 < 2.2e-16 ***
## week             1   99.418      1786     1400.3 < 2.2e-16 ***
## amu              1    3.993      1785     1396.3  0.045695 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using GAM with covariables

Same thing as above but using a GAM instead of a GLM.:

pipeline5 <- function(df, clinical_signs_filtering = NULL, antimicrobials_filtering = NULL, lcs,
                      lam, antimicrobials_input = NULL, clinical_signs_output = NULL, watching,
                      antimicrobials_week1 = NULL, wk, antimicrobials_watching = NULL) {
  require(magrittr)
  var_names <- setdiff(names(df), c("farm", "flock", "week"))
  cs_names <- grep("_use$", var_names, value = TRUE, invert = TRUE)
  am_names <- grep("_use$", var_names, value = TRUE)
  if (is.null(clinical_signs_filtering)) clinical_signs_filtering <- cs_names
  if (is.null(clinical_signs_output)) clinical_signs_output <- cs_names
  if (is.null(antimicrobials_filtering)) antimicrobials_filtering <- am_names
  if (is.null(antimicrobials_input)) antimicrobials_input <- am_names
  if (is.null(antimicrobials_week1)) antimicrobials_week1 <- am_names
  if (is.null(antimicrobials_watching)) antimicrobials_watching <- am_names
  df %>% 
    dplyr::mutate(amu = make_exposure(., antimicrobials_input),
                  disease = make_outcome(., clinical_signs_output, watching),
                  amu_first_weeks = first_weeks(., antimicrobials_week1, wk),
                  amu_observation = make_outcome(., antimicrobials_watching, watching)) %>% 
    dplyr::filter(prophylactic(., clinical_signs_filtering, antimicrobials_filtering, lcs, lam)) %>% 
    dplyr::select(farm, flock, week, amu_first_weeks, amu_observation, amu, disease) %>% 
    na.exclude() %>% 
    mgcv::gam(disease ~ amu_first_weeks + amu_observation + s(week) + amu, binomial, ., method = "REML")
}

And the wrapper with the simple interface:

pipeline6 <- function(df, clinical_signs = NULL, antimicrobials = NULL, filtering, watching, first_wks)
  pipeline5(df, clinical_signs, antimicrobials, filtering, filtering, antimicrobials, clinical_signs, watching, antimicrobials, first_wks, antimicrobials)

Let’s try it:

pipeline6(viparc, filtering = 2, watching = 2, first_wks = 1) %>% 
  summary()
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## disease ~ amu_first_weeks + amu_observation + s(week) + amu
## 
## Parametric coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -1.86077    0.13006 -14.307  < 2e-16 ***
## amu_first_weeksTRUE -0.35715    0.15206  -2.349   0.0188 *  
## amu_observationTRUE  1.24355    0.16230   7.662 1.83e-14 ***
## amuTRUE              0.01794    0.22197   0.081   0.9356    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df Chi.sq p-value    
## s(week) 4.097  5.021  111.1  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.147   Deviance explained = 16.5%
## -REML = 695.09  Scale est. = 1         n = 1790

The effects of durations and sets

With GLM

Let’s define an S3 generic function that extracts all the information that we are interested in from the output of an object of class glm or mgcv:

get_outputs <- function (x, ...) {
   UseMethod("get_outputs", x)
 }

Let’s now define a glm method for that generic function:

get_outputs.glm <- function(x) {
  data <- x$model
  m <- with(data, table(amu, disease))
  conf_int <- confint(x)
  list(n    = nrow(data),
       ct   = addmargins(m),
       prob = round(sweep(m, 1, rowSums(m), "/"), 2),
       or   = exp(tail(coef(x), 1)),
       p    = tail(anova(x, test = "LRT")$`Pr(>Chi)`, 1),
       ci   = conf_int[nrow(conf_int), ])
}

Let’s try it:

get_outputs(pipeline4(viparc, filtering = 2, watching = 2, first_wks = 1))
## $n
## [1] 1790
## 
## $ct
##        disease
## amu     FALSE TRUE  Sum
##   FALSE  1342  244 1586
##   TRUE    142   62  204
##   Sum    1484  306 1790
## 
## $prob
##        disease
## amu     FALSE TRUE
##   FALSE  0.85 0.15
##   TRUE   0.70 0.30
## 
## $or
## amuTRUE 
## 0.65976 
## 
## $p
## [1] 0.04569518
## 
## $ci
##        2.5 %       97.5 % 
## -0.835762763 -0.007863199

Let’s now define a batch of durations values:

(batch <- expand.grid(filtering = 1:2, watching = 1:2, first_wks = 1:2))
##   filtering watching first_wks
## 1         1        1         1
## 2         2        1         1
## 3         1        2         1
## 4         2        2         1
## 5         1        1         2
## 6         2        1         2
## 7         1        2         2
## 8         2        2         2

And let’s run pipeline4() on this batch:

batch %>% 
  purrr::pmap(pipeline4, df = viparc) %>% 
  lapply(get_outputs) %>% 
  head(2)
## [[1]]
## [[1]]$n
## [1] 2657
## 
## [[1]]$ct
##        disease
## amu     FALSE TRUE  Sum
##   FALSE  2126  231 2357
##   TRUE    242   58  300
##   Sum    2368  289 2657
## 
## [[1]]$prob
##        disease
## amu     FALSE TRUE
##   FALSE  0.90 0.10
##   TRUE   0.81 0.19
## 
## [[1]]$or
##   amuTRUE 
## 0.8025573 
## 
## [[1]]$p
## [1] 0.2482423
## 
## [[1]]$ci
##      2.5 %     97.5 % 
## -0.6039546  0.1509620 
## 
## 
## [[2]]
## [[2]]$n
## [1] 2023
## 
## [[2]]$ct
##        disease
## amu     FALSE TRUE  Sum
##   FALSE  1671  146 1817
##   TRUE    165   41  206
##   Sum    1836  187 2023
## 
## [[2]]$prob
##        disease
## amu     FALSE TRUE
##   FALSE  0.92 0.08
##   TRUE   0.80 0.20
## 
## [[2]]$or
##   amuTRUE 
## 0.7962257 
## 
## [[2]]$p
## [1] 0.3487159
## 
## [[2]]$ci
##      2.5 %     97.5 % 
## -0.7178204  0.2446093

With GAM

Let’s now define a gam method for the get_outputs() generic:

get_outputs.gam <- function(x, level = .95) {
  data <- x$model
  m <- with(data, table(amu, disease))
  summ <- summary(x)
  conf_int <- mgcv.helper::confint.gam(x, level = level)
  list(n    = nrow(data),
       ct   = addmargins(m),
       prob = round(sweep(m, 1, rowSums(m), "/"), 2),
       or   = unname(tail(summ$p.coef, 1)),
       p    = unname(tail(summ$p.pv, 1)),
       ci   = unlist(conf_int[nrow(conf_int), 5:6]))
}

Let’s try it:

get_outputs(pipeline6(viparc, filtering = 2, watching = 2, first_wks = 1))
## $n
## [1] 1790
## 
## $ct
##        disease
## amu     FALSE TRUE  Sum
##   FALSE  1342  244 1586
##   TRUE    142   62  204
##   Sum    1484  306 1790
## 
## $prob
##        disease
## amu     FALSE TRUE
##   FALSE  0.85 0.15
##   TRUE   0.70 0.30
## 
## $or
## [1] 0.0179444
## 
## $p
## [1] 0.935568
## 
## $ci
##       2.5%      97.5% 
## -0.4174045  0.4532934

Let’s now run it on a batch of durations values, and put the output in shape:

batch %>% 
  tibble::as_tibble() %>%
  assign("plan", ., 1) %>% # saving intermediate value
  purrr::pmap(pipeline6, df = viparc) %>% 
  lapply(get_outputs) %>% 
  purrr::transpose() %>%
  tibble::as_tibble() %>% 
  dplyr::mutate_at(c("n", "or", "p"), unlist) %>% 
  dplyr::bind_cols(plan, .) # intermediate value re-used here
## # A tibble: 8 x 9
##   filtering watching first_wks     n ct      prob         or     p ci       
##       <int>    <int>     <int> <int> <list>  <list>    <dbl> <dbl> <list>   
## 1         1        1         1  2657 <table> <table>  0.0357 0.855 <dbl [2]>
## 2         2        1         1  2023 <table> <table>  0.236  0.350 <dbl [2]>
## 3         1        2         1  2398 <table> <table> -0.205  0.232 <dbl [2]>
## 4         2        2         1  1790 <table> <table>  0.0179 0.936 <dbl [2]>
## 5         1        1         2  2657 <table> <table>  0.0692 0.723 <dbl [2]>
## 6         2        1         2  2023 <table> <table>  0.313  0.211 <dbl [2]>
## 7         1        2         2  2398 <table> <table> -0.171  0.318 <dbl [2]>
## 8         2        2         2  1790 <table> <table>  0.0741 0.737 <dbl [2]>

run_plan() and Co

Let’s make a function out of this above code:

run_plan <- function(f, df, clinical_signs = NULL, antimicrobials = NULL, filtering = 1, watching = 1, first_wks = 1, nb_cores = 1) {
  require(magrittr)

  var_names <- setdiff(names(df), c("farm", "flock", "week"))  
  all_antimicrobials <- list(grep("_use$", var_names, value = TRUE))
  all_clinical_signs <- list(grep("_use$", var_names, value = TRUE, invert = TRUE))

  if (is.null(antimicrobials)) antimicrobials <- all_antimicrobials
  else if (! is.list(antimicrobials)) antimicrobials <- list(antimicrobials)
  if (is.null(clinical_signs)) clinical_signs <- all_clinical_signs
  else if (! is.list(clinical_signs)) clinical_signs <- list(clinical_signs)
  
  is_na <- function(x) {
    if (length(x) > 1) return(FALSE)
    is.na(x)
  }
  sel <- which(sapply(antimicrobials, is_na))
  if (length(sel)) antimicrobials[sel] <- all_antimicrobials
  sel <- which(sapply(clinical_signs, is_na))
  if (length(sel)) clinical_signs[sel] <- all_clinical_signs
  rm(is_na, sel, all_antimicrobials, all_clinical_signs, var_names)

  run <- function(x) {
    if (nb_cores < 2) return(purrr::pmap(x, f, df = df))
    c(FUN = f, x, MoreArgs = list(list(df = df)), SIMPLIFY = FALSE, mc.cores = nb_cores) %>% 
        do.call(parallel::mcmapply, .)
  }
  
  environment() %>% 
    as.list() %>% 
    `[<-`(c("f", "df", "run", "nb_cores"), NULL) %>% 
    expand.grid() %>% 
    assign("plan", ., 1) %>% # saving intermediate value
    run() %>% 
    lapply(get_outputs) %>%
    purrr::transpose() %>%
    tibble::as_tibble() %>%
    dplyr::mutate_at(c("n", "or", "p"), unlist) %>%
    dplyr::bind_cols(tibble::as_tibble(plan), .) %>% # reusing the intermediate value
    dplyr::mutate(lower = unlist(lapply(ci, head, 1)),
                  upper = unlist(lapply(ci, tail, 1))) %>% 
    dplyr::select(-ci)
}

Let’s also write two functions that will help the exploration of the outputs of the run_plan() function. The first one develops the list of clinical signs and / or antimicrobials so that they can easily be read in the data frame:

develop <- function(df, var = NULL) {
  names_df <- names(df)
  var_ref <- c("clinical_signs", "antimicrobials")
  if (is.null(var)) var <- intersect(names_df, var_ref)
  stopifnot(var %in% names_df)
  stopifnot(var %in% var_ref)
  dplyr::mutate_at(df, var, ~ sapply(., paste, collapse = ", "))
}

The second one removes the columns that contain no variability:

simplify <- function(df) {
  df[, sapply(df, function(x) length(unique(x))) > 1]
}

Let’s now try these three functions. With default values:

run_plan(pipeline6, viparc)
## # A tibble: 1 x 12
##   clinical_signs antimicrobials filtering watching first_wks     n ct      prob        or     p  lower upper
##   <list>         <list>             <dbl>    <dbl>     <dbl> <int> <list>  <list>   <dbl> <dbl>  <dbl> <dbl>
## 1 <chr [6]>      <chr [45]>             1        1         1  2657 <table> <table> 0.0357 0.855 -0.348 0.420

By default, the set of all clinical signs and the set of all antimicrobials are considered, and the durations of the begining of the flock, the filtering period, and the observation period are fixed to 1 week. This can be changed by assigning values to the arguments of the run_plan() function. For example, considering 2 different durations of the filtering period:

(tmp <- run_plan(pipeline6, viparc, filtering = 1:2))
## # A tibble: 2 x 12
##   clinical_signs antimicrobials filtering watching first_wks     n ct      prob        or     p  lower upper
##   <list>         <list>             <int>    <dbl>     <dbl> <int> <list>  <list>   <dbl> <dbl>  <dbl> <dbl>
## 1 <chr [6]>      <chr [45]>             1        1         1  2657 <table> <table> 0.0357 0.855 -0.348 0.420
## 2 <chr [6]>      <chr [45]>             2        1         1  2023 <table> <table> 0.236  0.350 -0.259 0.731

Note that you can access the contingency tables this way:

tmp$ct
## [[1]]
##        disease
## amu     FALSE TRUE  Sum
##   FALSE  2126  231 2357
##   TRUE    242   58  300
##   Sum    2368  289 2657
## 
## [[2]]
##        disease
## amu     FALSE TRUE  Sum
##   FALSE  1671  146 1817
##   TRUE    165   41  206
##   Sum    1836  187 2023

Access to the probabilities would be done the same way:

tmp$prob
## [[1]]
##        disease
## amu     FALSE TRUE
##   FALSE  0.90 0.10
##   TRUE   0.81 0.19
## 
## [[2]]
##        disease
## amu     FALSE TRUE
##   FALSE  0.92 0.08
##   TRUE   0.80 0.20

and same thing for the list of clinical signs and antimicrobials. If you want to see only the columns for which values change:

simplify(tmp)
## # A tibble: 2 x 8
##   filtering     n ct      prob        or     p  lower upper
##       <int> <int> <list>  <list>   <dbl> <dbl>  <dbl> <dbl>
## 1         1  2657 <table> <table> 0.0357 0.855 -0.348 0.420
## 2         2  2023 <table> <table> 0.236  0.350 -0.259 0.731

Let’s see a more complex example:

(tmp <- run_plan(pipeline6, viparc, filtering = 1:2, watching = 1:2,
                 antimicrobials = list("colistin_use", c("colistin_use", "cefotaxime_use")),
                 clinical_signs = "respiratory", first_wks = 1))
## # A tibble: 8 x 12
##   clinical_signs antimicrobials filtering watching first_wks     n ct      prob         or     p  lower upper
##   <list>         <list>             <int>    <int>     <dbl> <int> <list>  <list>    <dbl> <dbl>  <dbl> <dbl>
## 1 <chr [1]>      <chr [1]>              1        1         1  4370 <table> <table> -0.251  0.537 -1.05  0.545
## 2 <chr [1]>      <chr [2]>              1        1         1  4369 <table> <table> -0.253  0.534 -1.05  0.543
## 3 <chr [1]>      <chr [1]>              2        1         1  3909 <table> <table> -0.280  0.566 -1.24  0.677
## 4 <chr [1]>      <chr [2]>              2        1         1  3907 <table> <table> -0.280  0.566 -1.24  0.677
## 5 <chr [1]>      <chr [1]>              1        2         1  4065 <table> <table> -0.127  0.664 -0.700 0.446
## 6 <chr [1]>      <chr [2]>              1        2         1  4064 <table> <table> -0.130  0.657 -0.703 0.443
## 7 <chr [1]>      <chr [1]>              2        2         1  3609 <table> <table> -0.0904 0.789 -0.752 0.572
## 8 <chr [1]>      <chr [2]>              2        2         1  3607 <table> <table> -0.0917 0.786 -0.754 0.570

This output can be visualized with the simplify() function:

simplify(tmp)
## # A tibble: 8 x 10
##   antimicrobials filtering watching     n ct      prob         or     p  lower upper
##   <list>             <int>    <int> <int> <list>  <list>    <dbl> <dbl>  <dbl> <dbl>
## 1 <chr [1]>              1        1  4370 <table> <table> -0.251  0.537 -1.05  0.545
## 2 <chr [2]>              1        1  4369 <table> <table> -0.253  0.534 -1.05  0.543
## 3 <chr [1]>              2        1  3909 <table> <table> -0.280  0.566 -1.24  0.677
## 4 <chr [2]>              2        1  3907 <table> <table> -0.280  0.566 -1.24  0.677
## 5 <chr [1]>              1        2  4065 <table> <table> -0.127  0.664 -0.700 0.446
## 6 <chr [2]>              1        2  4064 <table> <table> -0.130  0.657 -0.703 0.443
## 7 <chr [1]>              2        2  3609 <table> <table> -0.0904 0.789 -0.752 0.572
## 8 <chr [2]>              2        2  3607 <table> <table> -0.0917 0.786 -0.754 0.570

the develop() function:

develop(tmp)
## # A tibble: 8 x 12
##   clinical_signs antimicrobials               filtering watching first_wks     n ct      prob         or     p  lower upper
##   <chr>          <chr>                            <int>    <int>     <dbl> <int> <list>  <list>    <dbl> <dbl>  <dbl> <dbl>
## 1 respiratory    colistin_use                         1        1         1  4370 <table> <table> -0.251  0.537 -1.05  0.545
## 2 respiratory    colistin_use, cefotaxime_use         1        1         1  4369 <table> <table> -0.253  0.534 -1.05  0.543
## 3 respiratory    colistin_use                         2        1         1  3909 <table> <table> -0.280  0.566 -1.24  0.677
## 4 respiratory    colistin_use, cefotaxime_use         2        1         1  3907 <table> <table> -0.280  0.566 -1.24  0.677
## 5 respiratory    colistin_use                         1        2         1  4065 <table> <table> -0.127  0.664 -0.700 0.446
## 6 respiratory    colistin_use, cefotaxime_use         1        2         1  4064 <table> <table> -0.130  0.657 -0.703 0.443
## 7 respiratory    colistin_use                         2        2         1  3609 <table> <table> -0.0904 0.789 -0.752 0.572
## 8 respiratory    colistin_use, cefotaxime_use         2        2         1  3607 <table> <table> -0.0917 0.786 -0.754 0.570

or the two in combination:

simplify(develop(tmp))
## # A tibble: 8 x 10
##   antimicrobials               filtering watching     n ct      prob         or     p  lower upper
##   <chr>                            <int>    <int> <int> <list>  <list>    <dbl> <dbl>  <dbl> <dbl>
## 1 colistin_use                         1        1  4370 <table> <table> -0.251  0.537 -1.05  0.545
## 2 colistin_use, cefotaxime_use         1        1  4369 <table> <table> -0.253  0.534 -1.05  0.543
## 3 colistin_use                         2        1  3909 <table> <table> -0.280  0.566 -1.24  0.677
## 4 colistin_use, cefotaxime_use         2        1  3907 <table> <table> -0.280  0.566 -1.24  0.677
## 5 colistin_use                         1        2  4065 <table> <table> -0.127  0.664 -0.700 0.446
## 6 colistin_use, cefotaxime_use         1        2  4064 <table> <table> -0.130  0.657 -0.703 0.443
## 7 colistin_use                         2        2  3609 <table> <table> -0.0904 0.789 -0.752 0.572
## 8 colistin_use, cefotaxime_use         2        2  3607 <table> <table> -0.0917 0.786 -0.754 0.570

which is equivalent to:

develop(simplify(tmp))
## # A tibble: 8 x 10
##   antimicrobials               filtering watching     n ct      prob         or     p  lower upper
##   <chr>                            <int>    <int> <int> <list>  <list>    <dbl> <dbl>  <dbl> <dbl>
## 1 colistin_use                         1        1  4370 <table> <table> -0.251  0.537 -1.05  0.545
## 2 colistin_use, cefotaxime_use         1        1  4369 <table> <table> -0.253  0.534 -1.05  0.543
## 3 colistin_use                         2        1  3909 <table> <table> -0.280  0.566 -1.24  0.677
## 4 colistin_use, cefotaxime_use         2        1  3907 <table> <table> -0.280  0.566 -1.24  0.677
## 5 colistin_use                         1        2  4065 <table> <table> -0.127  0.664 -0.700 0.446
## 6 colistin_use, cefotaxime_use         1        2  4064 <table> <table> -0.130  0.657 -0.703 0.443
## 7 colistin_use                         2        2  3609 <table> <table> -0.0904 0.789 -0.752 0.572
## 8 colistin_use, cefotaxime_use         2        2  3607 <table> <table> -0.0917 0.786 -0.754 0.570

Generating tables

Let’s download the data on the antimicrobials classes:

ab_classes <- readr::read_csv("https://raw.githubusercontent.com/viparc/clires_data/master/data/antimicrobial_classes.csv", col_types = "cc")
ab_classes$antimicrobial <- paste0(ab_classes$antimicrobial, "_use")

It looks like this:

ab_classes
## # A tibble: 44 x 2
##    antimicrobial    class         
##    <chr>            <chr>         
##  1 amoxicillin_use  penecillins   
##  2 ampicillin_use   penecillins   
##  3 cefadroxil_use   cephalosporins
##  4 cefotaxime_use   cephalosporins
##  5 cephalexin_use   cephalosporins
##  6 ceftiofur_use    cephalosporins
##  7 colistin_use     polypeptides  
##  8 enramycin_use    polypeptides  
##  9 enrofloxacin_use quinolones    
## 10 flumequine_use   quinolones    
## # … with 34 more rows

We can use the run_plan() function to compute odds ratios across antimicrobials classes, clinical signs and values 1 to 3 for the durations in weeks of the initial, filtering and watching periods (it takes about 15’ to run on 3 cores):

all_cs <- function(x) {
  sel <- sapply(x, length)
  x[which(sel > 1)] <- "all"
  x
}

replace_empty <- function(x) {
  x[which(x == "")] <- "all"
  x
}

tmp <- run_plan(pipeline6, viparc, filtering = 1:3, watching = 1:3,
                antimicrobials = c(split(ab_classes$antimicrobial, ab_classes$class), NA),
                clinical_signs = c(as.list(setdiff(grep("_use$", names(viparc), value = TRUE, invert = TRUE), c("farm", "flock", "week"))), NA),
                first_wks = 1:3, nb_cores = 3) %>% 
  dplyr::mutate(clinical_signs = unlist(all_cs(clinical_signs)),
                antimicrobials = replace_empty(names(antimicrobials))) %>% 
  dplyr::rename(class = antimicrobials)
tmp <- names(tmp$antimicrobials)
cs <- tmp$clinical_signs
sel <- sapply(tmp$clinical_signs, length)
cs[which(sel > 1)] <- "all"

We can the split this output by the durations of the initial, filtering and watching periods, and spread the data to show antimicrobials classes in rows and clinical signs in columns:

tmp %>% 
  dplyr::mutate(or    = exp(or),
                lower = exp(lower),
                upper = exp(upper)) %>% 
  dplyr::group_by(filtering, watching, first_wks) %>% 
  dplyr::group_split() %>% 
  lapply(. %>%
           dplyr::select(clinical_signs, class, or) %>% 
           tidyr::pivot_wider(names_from = clinical_signs, values_from = or))

Session info

The analyses were performed on the system and versions as defined below:

sessionInfo(required)
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8
## 
## attached base packages:
## character(0)
## 
## other attached packages:
## [1] dplyr_0.8.3  magrittr_1.5 mgcv_1.8-31  purrr_0.3.3  readr_1.3.1 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.3             knitr_1.26             MASS_7.3-51.4          hms_0.5.2              splines_3.6.2         
##  [6] grDevices_3.6.2        tidyselect_0.2.99.9000 lattice_0.20-38        R6_2.4.1               rlang_0.4.2           
## [11] fansi_0.4.0            stringr_1.4.0          mgcv.helper_0.1.8      tools_3.6.2            utils_3.6.2           
## [16] grid_3.6.2             nlme_3.1-142           xfun_0.11              utf8_1.1.4             cli_2.0.0             
## [21] htmltools_0.4.0        stats_3.6.2            datasets_3.6.2         yaml_2.2.0             assertthat_0.2.1      
## [26] digest_0.6.23          tibble_2.1.3           base_3.6.2             crayon_1.3.4           Matrix_1.2-18         
## [31] vctrs_0.2.0.9007       graphics_3.6.2         curl_4.3               glue_1.3.1             evaluate_0.14         
## [36] rmarkdown_2.0          stringi_1.4.3          compiler_3.6.2         pillar_1.4.2           methods_3.6.2         
## [41] pkgconfig_2.0.3

If you want to run the code locally, you can copy and paste code chunks to your commands line. Make sure though that your system and packages versions are the same or more recent than the ones listed above.

Running the script locally

You can download the R markdown file used to generate this page as so:

download.file("https://raw.githubusercontent.com/viparc/prophylactic/master/prophylactic.Rmd", "prophylactic.Rmd")

You can transforme this prophylactic.Rmd R markdown document in an R script named prophylactic.R as so:

if (!require("kintr")) install.packages("knitr")
knitr::purl("prophylactic.Rmd")
script <- readLines("prophylactic.R")
writeLines(script[-c(seq_len(which(grepl("starts here", script))[1]),
                     which(grepl("session_info", script))[1]:length(script))],
           "prophylactic.R")

To execute the code:

source("prophylactic.R")