---
title: "Model repeated cross-sectional data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Model repeated cross-sectional data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, output=FALSE, message=FALSE, warning=FALSE}
library(serosv)
library(dplyr)
library(magrittr)
library(ggplot2)
```

## Age-time varying model

To monitor changes in a population's seroprevalence over time, modelers often conduct multiple cross-sectional surveys at different time points, each using a new representative sample. The resulting data are known as **repeated cross-sectional data**.

**Proposed approach**

To model repeated cross-sectional serological data, `serosv` offers `age_time_model()` function which implements the following workflow:

1.  Fit age-specific seroprevalence for each survey period
2.  Monotonize age-specific or birth-cohort-specific prevalence over time
3.  Fit monotonized age-specific seroprevalence for each survey period

**Fitting data**

The function expects input data with the following columns:

-   Either `age` and `status` (linelisting) or `age` and `pos` + `tot` (aggregated)

-   A column for date of survey (specified via `time_col` argument)

-   A column for id of each survey period (specified via `grouping_col` argument)

```{r}
# Prepare data 
tb_nl <- tb_nl_1966_1973 %>% 
  mutate(
    survey_year = age + birthyr,
    survey_time = as.Date(paste0(survey_year, "-01-01"))
  ) %>% select(-birthyr) %>% 
  filter(survey_year > 1966) %>% 
  group_by(age, survey_year, survey_time) %>% 
  summarize(pos = sum(pos), tot = sum(tot), .groups = "drop")

head(tb_nl)
```

The monotonization method can be specified via the `monotonize_method` argument, `serosv` currently supports 2 options:

-   Pooled adjacent violators average (`monotonize_method = "pava"`)

-   Shape constrained additive model (`monotonize_method = "scam"`)

The users can also configure to monotonize either:

-   Age-specific seroprevalence over time (`age_correct = FALSE`)

-   Or birth cohort specific seroprevalence over time (`age_correct=TRUE`)

```{r}
out_pava <- tb_nl %>% 
  age_time_model(
    time_col = "survey_time", 
    grouping_col = "survey_year",
    age_correct = F,
    monotonize_method = "pava"
  ) %>% 
  suppressWarnings()

out_scam <- tb_nl %>% 
  age_time_model(
    time_col = "survey_time", 
    grouping_col = "survey_year",
    age_correct = T,
    monotonize_method = "scam"
  ) %>% 
  suppressWarnings()
```

The output is a `data.frame` with dimension [number of survey, 9], where each row corresponds to a single survey period. The columns are:

-   column for id of survey period

-   `df` - input data.frame corresponding to that survey period

-   `info` - model for the seroprevalence

-   `monotonized_info` - model for the monotonized seroprevalence

-   `monotonized_ci_mod` - model for the monotonized confidence interval

-   `sp` - predicted seroprevalence of the given input data

-   `foi` - estimated force of infection from `sp`

-   `monotonized_sp` - predicted monotonized seroprevalence of the given input data

-   `monotonized_foi` - estimated force of infection from `monotonized_sp`

```{r}
out_pava
out_scam
```

For visualization, the plot function for `age_time_model` offers the following configurations

-   `facet` whether to visualize result for each survey period separately (`facet=TRUE`) or on a single plot (`facet=FALSE`)

-   `modtype` choose which model to visualize, the model fitted with input data (`modtype="non-monotonized"`) or monotonized data (`modtype="monotonized"`)

***Example:*** output for model with PAVA for monotonization

```{r warning=FALSE, message=FALSE}
plot(out_pava, facet = TRUE, modtype = "non-monotonized") + ylim(c(0, 0.07))
plot(out_pava, facet = TRUE, modtype = "monotonized") + ylim(c(0, 0.07))

plot(out_pava, facet = FALSE, modtype = "monotonized") + ylim(c(0, 0.07))
```

***Example:*** output for model with SCAM for monotonization

```{r warning=FALSE, message=FALSE}
plot(out_scam, facet = TRUE, modtype = "non-monotonized") + ylim(c(0, 0.07))
plot(out_scam, facet = TRUE, modtype = "monotonized") + ylim(c(0, 0.07))
```
