Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,21 @@ Do not use generic acknowledgements without locators
or plaintext author-title references
when a BibTeX citation is available.

## Observational vs Causal Estimands

Always distinguish observational estimands
from causal estimands in notation and prose.

- Use observational notation
(for example, standardized risks based on `\E{Y \mid A=a, Z=z}`)
when discussing model-based associations.
- Use potential-outcome notation
(for example, `\Pr(Y^a = 1)`)
only when making a causal claim.
- If observational and causal estimands are equated,
explicitly state identification assumptions
(consistency, exchangeability, and positivity).

## Variable Definitions in Exercises

When introducing model variables in exercises,
Expand Down Expand Up @@ -383,6 +398,113 @@ When introducing or editing formal statistical definitions in `.qmd` files:
ensure those terms also have formal `#def-` div definitions
in the relevant scope before relying on them

## Slidebreaks Before Theorem-Type Divs

Always add `{{< slidebreak >}}` on a blank line immediately before
every theorem-type div opener.
This ensures slide-format output stays readable.

Theorem-type div types (per [Quarto cross-reference docs](https://quarto.org/docs/authoring/cross-references.html#theorems-and-proofs)):
`#thm-`, `#lem-`, `#cor-`, `#prp-`, `#cnj-`, `#def-`, `#exm-`, `#exr-`, `#rem-`

### Slidebreaks in including vs. included files

When a subfile's **first** content is a theorem-type div,
place the `{{< slidebreak >}}` in the **including** (parent) file,
immediately before the `{{< include >}}` shortcode —
**not** at the start of the included subfile itself.

**Correct** — slidebreak in the including file:
```qmd
<!-- in the parent file -->
{{< slidebreak >}}

{{< include _subfiles/chapter/_exm-my-example.qmd >}}
```

```qmd
<!-- in _exm-my-example.qmd — no leading slidebreak -->
:::{#exm-my-example}

#### My example title

Content...

:::
```

**Incorrect** — slidebreak inside the included file:
```qmd
<!-- in the parent file — no slidebreak -->
{{< include _subfiles/chapter/_exm-my-example.qmd >}}
```

```qmd
<!-- in _exm-my-example.qmd — do NOT put slidebreak here -->
{{< slidebreak >}}

:::{#exm-my-example}

#### My example title

Content...

:::
```

**Correct** example (non-leading slidebreak, same file):
```qmd
{{< slidebreak >}}

:::{#def-collapsibility}

#### Collapsibility

A measure is *collapsible* if ...

:::
```

**Incorrect** (missing slidebreak):
```qmd
:::{#def-collapsibility}

#### Collapsibility

A measure is *collapsible* if ...

:::
```

## Example Formatting

All worked examples in `.qmd` files must be wrapped in a Quarto `#exm-` div.
Never leave a named example as a plain markdown section.

**Correct:**
```qmd
:::{#exm-wcgs-marginal-rd}

##### Example: Marginal risk difference

Content of the example...

:::
```

**Incorrect:**
```qmd
##### Example: Marginal risk difference

Content of the example...
```

- Use an id beginning `#exm-` (for example, `#exm-wcgs-marginal-rd`)
- Put the example title in a heading inside the div,
at the heading level matching the surrounding section depth
- All content for the example (setup, computation, interpretation)
should live inside the div

## Div Titles vs. Markdown Headings

**CRITICAL**: Div titles (headings inside divs like `:::{#def-...}`, `:::{#thm-...}`, `:::{#exm-...}`, etc.) are NOT the same as regular markdown headings.
Expand Down Expand Up @@ -822,6 +944,7 @@ Content here.

More content.
```
When a subfile begins with a theorem-type div (`#thm-`, `#lem-`, `#cor-`, `#prp-`, `#cnj-`, `#def-`, `#exm-`, `#exr-`, `#rem-`), place the preceding `{{< slidebreak >}}` in the **parent** file (before the `{{< include >}}`), not inside the subfile. The subfile itself should not start with `{{< slidebreak >}}`.

## Computer Algebra Systems (CAS)

Expand Down
2 changes: 2 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ Before committing any `.qmd`, `.R`, or config file change:

### Quarto
- Use `{{< slidebreak >}}` instead of `---` for slide breaks
- Add `{{< slidebreak >}}` immediately before every theorem-type div (`#thm-`, `#lem-`, `#cor-`, `#prp-`, `#cnj-`, `#def-`, `#exm-`, `#exr-`, `#rem-`)
- When a subfile begins with a theorem-type div, put the preceding `{{< slidebreak >}}` in the **parent** file (before the `{{< include >}}`), not inside the subfile
- Default to `#| code-fold: true` for figure/table chunks
- Use div format (`:::{#fig-...}`) for figures and tables, not chunk-option `fig-cap`/`tbl-cap`
- Do not indent `:::` fenced div markers inside lists
Expand Down
68 changes: 68 additions & 0 deletions _subfiles/logistic-regression/_exm-collapsibility.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
:::{#exm-collapsibility}

#### Collapsibility: numerical illustration

Consider a hypothetical scenario with two strata
($Z = 0$: low-risk, $Z = 1$: high-risk).
We make two simplifying assumptions:

1. **No confounding** ($A \perp Z$):
exposure is independent of the covariate,
so $\p(Z = z \mid A = a) = \p(Z = z)$ for all $a$ and $z$.
2. **Equal stratum sizes** ($\p(Z = 0) = \p(Z = 1) = \tfrac{1}{2}$):
both strata have the same marginal probability.

Together, these imply
$\p(Z = z \mid A = a) = \tfrac{1}{2}$ for all $a$ and $z$,
so the simple (equal-weight) average of stratum-specific effects
equals both the observed marginal risk $\pi(a)$ and the causal marginal risk $\pi_a$:

```{r}
#| label: collapsibility-example
#| code-fold: true

strata <- tibble::tibble(
stratum = c("Z = 0 (low risk)", "Z = 1 (high risk)"),
pi_0 = c(0.05, 0.30),
pi_1 = c(0.10, 0.50)
) |>
dplyr::mutate(
RD = pi_1 - pi_0,
RR = pi_1 / pi_0,
OR = (pi_1 / (1 - pi_1)) / (pi_0 / (1 - pi_0))
)

pi1_marg <- mean(strata$pi_1)
pi0_marg <- mean(strata$pi_0)

tibble::tibble(
Measure = c("Risk difference", "Risk ratio", "Odds ratio"),
Marginal = c(
pi1_marg - pi0_marg,
pi1_marg / pi0_marg,
(pi1_marg / (1 - pi1_marg)) / (pi0_marg / (1 - pi0_marg))
),
`Avg. conditional` = c(
mean(strata$RD), mean(strata$RR), mean(strata$OR)
),
`Marginal = avg. conditional?` = c("Yes", "No", "No")
) |>
knitr::kable(digits = 3)
```

Even with no confounding, the marginal RR and marginal OR both differ from
the average of their conditional counterparts,
while the marginal RD equals the average conditional RD exactly.

Non-collapsibility is distinct from effect-measure modification.
Here the stratum-specific effects also vary across strata
(risk ratios $2.00$ vs. $1.67$, odds ratios $2.11$ vs. $2.33$),
but that variation is not what drives the discrepancy:
even if the conditional odds ratio were held *constant* across strata,
the marginal odds ratio would generally still differ from it
(and lie closer to the null),
whereas a constant conditional risk difference
always reproduces the marginal risk difference.

:::

8 changes: 4 additions & 4 deletions _subfiles/logistic-regression/_sec-OR-alternatives.qmd
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@

### Objections to odds ratios
## Objections to odds ratios

{{< include _subfiles/logistic-regression/_sec_OR_objections.qmd >}}

### Deriving risk ratios and risk differences from logistic regression models
## Deriving risk ratios and risk differences from logistic regression models

{{< include _subfiles/logistic-regression/_sec-logistic-RR-RD.qmd >}}

### Other link functions for Bernoulli outcomes
## Other link functions for Bernoulli outcomes

{{< include _subfiles/logistic-regression/_sec-non-logistic-bernoulli-models.qmd >}}

### Quasibinomial
## Quasibinomial

See [Hua Zhou](https://hua-zhou.github.io/)'s [lecture notes](https://ucla-biostat-200c-2020spring.github.io/slides/04-binomial/binomial.html#:~:text=0.05%20%27.%27%200.1%20%27%20%27%201-,Quasi%2Dbinomial,-Another%20way%20to)
51 changes: 51 additions & 0 deletions _subfiles/logistic-regression/_sec-bootstrap-boot-package.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
The [`boot`](https://cran.r-project.org/package=boot) package provides a more streamlined interface for bootstrap inference.
Here's how to compute the same confidence interval using `boot::boot()`:

```{r}
#| label: boot-package-example
#| code-fold: show
#| eval: false

library(boot)

statistic_fn <- function(data, indices) {
boot_data <- data[indices, ]

boot_fit <- glm(
chd69_binary ~ dibpat + age,
data = boot_data,
family = binomial(link = "logit")
)

compute_marginal_rd(
model = boot_fit,
data = boot_data,
exposure_var = "dibpat",
exposed_level = "Type A",
unexposed_level = "Type B"
)
}

set.seed(20260512)
boot_results <- boot(
data = wcgs_clean,
statistic = statistic_fn,
# Keep the rendered example fast; increase to 2000+ for final analyses.
R = 300
)

boot_results

boot.ci(boot_results, type = c("perc", "bca"))
```

The [`boot`](https://cran.r-project.org/package=boot) package provides several types of confidence intervals,
including the percentile method (`perc`)
and the bias-corrected and accelerated (BCa) method (`bca`),
which can provide better coverage in some situations.
The BCa method is more demanding than the percentile method:
its bias-correction and acceleration estimates are unstable
at the illustrative `R = 300` used here
(and can warn or fail outright),
so use a substantially larger `R` (at least 2000)
before relying on `bca` intervals.
42 changes: 42 additions & 0 deletions _subfiles/logistic-regression/_sec-bootstrap-inference.qmd
Comment thread
d-morrison marked this conversation as resolved.
Comment thread
d-morrison marked this conversation as resolved.
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
Adapted from
[@vittinghoff2e, Section 3.6, p. 62],
[@hastie2009esl2e, Section 7.11, p. 249],
and
[@james2021islr2e, Chapter 5, Section 5.2, p. 209].

The bootstrap is a resampling method
that allows us to estimate the sampling distribution
of a statistic without making strong parametric assumptions.
For an introduction to the bootstrap,
see [Bootstrap Confidence Intervals](basic-statistical-methods.qmd#sec-bootstrap-ci).

#### Bootstrap algorithm

To construct a bootstrap confidence interval
for a marginal risk difference:

1. For $b = 1, \ldots, B$ (e.g., $B = 1000$):
a. Draw a bootstrap sample of size $n$ with replacement from the original data
b. Fit the logistic regression model to the bootstrap sample
c. Compute the marginal risk difference from the fitted model
2. The bootstrap distribution of the $B$ risk difference estimates
approximates the sampling distribution
3. Construct a confidence interval using the percentile method
(e.g., the 2.5th and 97.5th percentiles for a 95% CI)

The bootstrap standard error is the standard deviation
of the bootstrap distribution.

{{< slidebreak >}}

:::{#exm-wcgs-marginal-rd}

#### Example: CHD risk and behavioral pattern

{{< include _subfiles/logistic-regression/_sec-wcgs-bootstrap-example.qmd >}}

:::

#### Alternative: Using the boot package

{{< include _subfiles/logistic-regression/_sec-bootstrap-boot-package.qmd >}}
Loading
Loading