diff --git a/CLAUDE.md b/CLAUDE.md index b099e21cfc..91e9eb13ae 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -51,6 +51,10 @@ Before committing any `.qmd`, `.R`, or config file change: - Key macros: `\E{Y|X=x}`, `\ba`/`\ea`, `\tp{v}`, `\b`, `\g`, `\a`, `\devn(...)`, `\erf{...}` - Include every intermediate step in derivations — do not skip steps - Color coding: `\red{...}` for focal/extra terms, `\blue{...}` for shared terms +- **Matrix dimensions**: always verify dimension compatibility for every matrix expression -- dimensions of each operand must be consistent with the operation +- **Annotate matrix dimensions with underbraces** in display math: use `\underbrace{M}_{m \times n}` for each matrix or vector +- **Zero matrices**: never write bare `\mathbf{0}` in a matrix equation -- subscript dimensions: `\mathbf{0}_{m \times n}` +- **Jacobian**: `\deriv{\vb} \vx` where both are p-vectors produces a p × p Jacobian matrix (not a vector) - Ratios vs. factors: - Use the generic `\ratio`/`\ratiof` macro when a ratio's inputs are the **quantities themselves** (the odds, hazards, rates, etc.) — e.g. `\ratio(\odds_1, \odds_2)`, **not** `\ror(\odds_1, \odds_2)` — because the type of ratio is clear from the inputs. - Use the type-subscripted ratio macros (`\ror` for odds ratios, `\hazratio`/`\hr` for hazard ratios, `\rateratio`, `\riskratio`, `\prevratio`, `\cuhazratio`, …) only when the inputs are **covariate patterns** (e.g. `\ror(\vx,\vxs)`, `\hr(t\mid\vx:\vxs)`), where the subscript is needed to say which kind of ratio it is. diff --git a/_subfiles/math-prereqs/_sec_vector_calc.qmd b/_subfiles/math-prereqs/_sec_vector_calc.qmd index 919c26fafa..8197f3d8ca 100644 --- a/_subfiles/math-prereqs/_sec_vector_calc.qmd +++ b/_subfiles/math-prereqs/_sec_vector_calc.qmd @@ -61,17 +61,64 @@ $$\deriv{ \vb} f(\vb) = \pt{\deriv{ \vb\'} f(\vb)}$$ --- +:::{#def-constant-wrt-vector} +#### Constant + +$\vx$ is constant with respect to $\vb$ if + +$$ +\underbrace{\deriv{\vb} \tp{\vx}}_{p \times p} += \underbrace{\mathbf{0}}_{p \times p} +$$ + +::: + +:::{#exm-constant-wrt-vector} +#### A constant vector + +Let $\vb = \tp{(\beta_1, \beta_2)}$ and $\vx = \tp{(3, 5)}$, +so $x_1 = 3$ and $x_2 = 5$ do not depend on $\vb$. +Expanding $\deriv{\vb} \tp{\vx}$ into its matrix of scalar partial derivatives +(@def-vector-derivative, applied to each component of the row $\tp{\vx}$) +and evaluating each entry: + +$$ +\underbrace{\deriv{\vb} \tp{\vx}}_{2 \times 2} += \deriv{\vb} \sbmat{x_1 & x_2} += \sbmat{ +\deriv{\beta_1} x_1 & \deriv{\beta_1} x_2 \\ +\deriv{\beta_2} x_1 & \deriv{\beta_2} x_2 +} += \sbmat{ +\deriv{\beta_1} 3 & \deriv{\beta_1} 5 \\ +\deriv{\beta_2} 3 & \deriv{\beta_2} 5 +} += \sbmat{ +0 & 0 \\ +0 & 0 +} += \underbrace{\mathbf{0}}_{2 \times 2} +$$ + +Every entry is the derivative of a constant, so $\deriv{\vb} \tp{\vx} = \mathbf{0}$ +and $\vx$ is constant with respect to $\vb$ (@def-constant-wrt-vector). + +::: + +{{< slidebreak >}} + :::{#thm-deriv-lincom} #### Derivative of a dot product +If $\vx$ is constant with respect to $\vb$, then: + $$ -\deriv{\vb} \vx \cdot \vb = \deriv{\vb} \vb \cdot \vx = \vx +\underbrace{\deriv{\vb} (\vx \cdot \vb)}_{p \times 1} = +\underbrace{\deriv{\vb} (\vb \cdot \vx)}_{p \times 1} = +\underbrace{\vx}_{p \times 1} $$ -:::: notes -This looks a lot like non-vector calculus, except that you have to transpose the coefficient. -:::: ::: --- @@ -80,7 +127,7 @@ This looks a lot like non-vector calculus, except that you have to transpose the $$ \ba -\deriv{ \beta} (x\'\beta) +\deriv{\vb} (\vx \cdot \vb) &= \begin{bmatrix} \deriv{\beta_1}(x_1\beta_1+x_2\beta_2 +...+x_p \beta_p ) \\ @@ -101,6 +148,323 @@ $$ ::: +{{< slidebreak >}} + +:::{#exm-deriv-lincom} +#### Derivative of a dot product + +Let $\vx = \tp{(3, 5)}$ (constant with respect to $\vb$; +see @exm-constant-wrt-vector) and $\vb = \tp{(\beta_1, \beta_2)}$. +Then $\vx \cdot \vb = 3\beta_1 + 5\beta_2$, and by @thm-deriv-lincom: + +$$ +\underbrace{\deriv{\vb}(\vx \cdot \vb)}_{2 \times 1} += \underbrace{\vx}_{2 \times 1} += \begin{pmatrix} 3 \\ +5 \end{pmatrix} +$$ + +Verifying entry-wise: + +$$ +\deriv{\vb}(3\beta_1 + 5\beta_2) += \begin{pmatrix} \deriv{\beta_1}(3\beta_1 + 5\beta_2) \\ +\deriv{\beta_2}(3\beta_1 + 5\beta_2) \end{pmatrix} += \begin{pmatrix} 3 \\ +5 \end{pmatrix} +$$ + +Both methods agree. ✓ + +::: + +{{< slidebreak >}} + + +:::{#thm-deriv-dot-product} + +#### Product rule for dot-products + +If $\mathbf{a} = \mathbf{a}(\vx)$ and $\mathbf{b} = \mathbf{b}(\vx)$ +are differentiable $p \times 1$ vector functions of $\vx$, then: + +$$ +\ba +\deriv{\ubf{\vx}{p \times 1}} \dpf{\ubf{a}{p \times 1}}{\ubf{b}{p \times 1}} +&= +\paren{ + \deriv{\ubf{\vx}{p \times 1}} \ubf{\tp{a}}{1 \times p} + } +\ubf{b}{p \times 1} ++ +\paren{ + \deriv{\ubf{\vx}{p \times 1}} \ubf{\tp{b}}{1 \times p} +} +\ubf{a}{p \times 1} +\ea +$$ + +::: + +::: proof + +Entry-wise, for $i = 1, \ldots, p$: + +$$ +\ba +\left[\deriv{\vx} (\mathbf{a} \cdot \mathbf{b})\right]_i +&= \deriv{x_i} \sum_{k=1}^p a_k b_k \\ +&= \sum_{k=1}^p \paren{b_k \deriv{x_i} a_k + a_k \deriv{x_i} b_k} \\ +&= \left[\paren{\deriv{\vx} \tp{\mathbf{a}}}\mathbf{b}\right]_i + + \left[\paren{\deriv{\vx} \tp{\mathbf{b}}}\mathbf{a}\right]_i +\ea +$$ + +::: + +:::{#exm-deriv-dot-product} +#### Example of the dot-product rule + +Let $\vx = \tp{(\beta_1, \beta_2)}$, +$\mathbf{a}(\vx) = \tp{(\beta_1, \beta_1\beta_2)}$, +and $\mathbf{b}(\vx) = \tp{(\beta_2, \beta_1)}$. +Then: + +$$ +\mathbf{a} \cdot \mathbf{b} += \beta_1 \cdot \beta_2 + \beta_1\beta_2 \cdot \beta_1 += \beta_1\beta_2 + \beta_1^2\beta_2 +$$ + +By direct calculation: + +$$ +\underbrace{\deriv{\vx}(\mathbf{a} \cdot \mathbf{b})}_{2 \times 1} += \deriv{\vx}(\beta_1\beta_2 + \beta_1^2\beta_2) += \begin{pmatrix} \beta_2 + 2\beta_1\beta_2 \\ \beta_1 + \beta_1^2 \end{pmatrix} +$$ + +By the product rule (@thm-deriv-dot-product), +using +$\underbrace{\deriv{\vx}\tp{\mathbf{a}}}_{2 \times 2} += \begin{pmatrix}1 & \beta_2 \\ 0 & \beta_1\end{pmatrix}$ +and +$\underbrace{\deriv{\vx}\tp{\mathbf{b}}}_{2 \times 2} += \begin{pmatrix}0 & 1 \\ 1 & 0\end{pmatrix}$: + +$$ +\ba +\underbrace{\deriv{\vx}(\mathbf{a} \cdot \mathbf{b})}_{2 \times 1} +&= +\underbrace{\begin{pmatrix}1 & \beta_2 \\ 0 & \beta_1\end{pmatrix}}_{2 \times 2} +\underbrace{\begin{pmatrix}\beta_2 \\ \beta_1\end{pmatrix}}_{2 \times 1} ++ +\underbrace{\begin{pmatrix}0 & 1 \\ 1 & 0\end{pmatrix}}_{2 \times 2} +\underbrace{\begin{pmatrix}\beta_1 \\ \beta_1\beta_2\end{pmatrix}}_{2 \times 1} +\\ +&= +\begin{pmatrix}\beta_2 + \beta_1\beta_2 \\ \beta_1^2\end{pmatrix} ++ +\begin{pmatrix}\beta_1\beta_2 \\ \beta_1\end{pmatrix} +\\ +&= +\begin{pmatrix}\beta_2 + 2\beta_1\beta_2 \\ \beta_1^2 + \beta_1\end{pmatrix} +\ea +$$ + +Both methods agree. ✓ + +::: + +{{< slidebreak >}} + +:::{#thm-deriv-linear-map} + +#### Derivative of a linear map + +If $A$ is an $m \times p$ matrix +that is constant with respect to $\vb$, +then: + +$$ +\underbrace{\deriv{\vb} (A\vb)}_{p \times m} = +\underbrace{\tp{A}}_{p \times m} +$$ + +::: + +::: proof + +For entry $(i,j)$, where row $i$ indexes the denominator $\vb$ +(see @def-vector-derivative) +and column $j$ indexes the numerator $A\vb$: + +$$ +\ba +\left[\deriv{\vb} (A\vb)\right]_{ij} +&= \deriv{\beta_i} (A\vb)_j \\ +&= \deriv{\beta_i} \sum_{k=1}^{p} a_{jk} \beta_k \\ +&= a_{ji} \\ +&= \left[\tp{A}\right]_{ij} +\ea +$$ + +::: + +:::{#exm-deriv-linear-map} +#### Derivative of a linear map + +Let $A = \begin{pmatrix} 2 & 3 \end{pmatrix}$ ($1 \times 2$) and $\vb = \tp{(\beta_1, \beta_2)}$. +Then $A\vb = 2\beta_1 + 3\beta_2$, and by @thm-deriv-linear-map: + +$$ +\underbrace{\deriv{\vb}(A\vb)}_{2 \times 1} += \underbrace{\tp{A}}_{2 \times 1} += \begin{pmatrix} 2 \\ +3 \end{pmatrix} +$$ + +::: + +{{< slidebreak >}} + +:::{#thm-deriv-matrix-vector} + +#### Vector-derivative of a matrix-vector product + +If $A$ is an $m \times q$ matrix that is constant with respect to $\vb$, +and $\vecf{v} = \vecf{v}(\vb)$ is a $q \times 1$ vector +that depends on the $p \times 1$ vector $\vb$, +then: + +$$ +\underbrace{\deriv{\vb} (A\vecf{v})}_{p \times m} = +\underbrace{\paren{\deriv{\vb} \vecf{v}}}_{p \times q} +\underbrace{\tp{A}}_{q \times m} +$$ + +:::: notes +This generalizes @thm-deriv-linear-map, +which is the special case $\vecf{v} = \vb$ +(so that $\deriv{\vb} \vb = \matr{I}$ +and $\deriv{\vb} (A\vb) = \tp{A}$). +:::: + +::: + +::: proof + +For entry $(i,j)$, where row $i$ indexes the denominator $\vb$ +and column $j$ indexes the numerator $A\vecf{v}$: + +$$ +\ba +\left[\deriv{\vb} (A\vecf{v})\right]_{ij} +&= \deriv{\beta_i} (A\vecf{v})_j \\ +&= \deriv{\beta_i} \sum_{k=1}^{q} a_{jk} v_k \\ +&= \sum_{k=1}^{q} a_{jk} \deriv{\beta_i} v_k \\ +&= \sum_{k=1}^{q} \left[\deriv{\vb} \vecf{v}\right]_{ik} \left[\tp{A}\right]_{kj} \\ +&= \left[\paren{\deriv{\vb} \vecf{v}} \tp{A}\right]_{ij} +\ea +$$ + +::: + +:::{#exm-deriv-matrix-vector} +#### Vector-derivative of a matrix-vector product + +Let $A = \begin{pmatrix} 2 & 3 \end{pmatrix}$ ($1 \times 2$, constant) +and $\vecf{v}(\vb) = \tp{(\beta_1^2, \beta_2^2)}$. +Then $A\vecf{v} = 2\beta_1^2 + 3\beta_2^2$. +By @thm-deriv-matrix-vector: + +$$ +\ba +\underbrace{\deriv{\vb}(A\vecf{v})}_{2 \times 1} +&= \begin{pmatrix} 2\beta_1 & 0 \\ +0 & 2\beta_2 \end{pmatrix} +\begin{pmatrix} 2 \\ +3 \end{pmatrix} \\ +&= \begin{pmatrix} 4\beta_1 \\ +6\beta_2 \end{pmatrix} +\ea +$$ + +::: + +{{< slidebreak >}} + +:::{#cor-deriv-lincom-tp} + +#### Derivative of a dot product, transpose-product form + +If $\vx$ is constant with respect to $\vb$, then: + +$$ +\underbrace{\deriv{\vb} (\underbrace{\tp{\vx}}_{1 \times p} \underbrace{\vb}_{p \times 1})}_{p \times 1} = +\underbrace{\vx}_{p \times 1} +$$ + +:::: notes +This looks a lot like non-vector calculus, except that you have to transpose the coefficient: +in scalar calculus $\deriv{x}(cx) = c$, but here the coefficient $\tp{\vx}$ (a row vector) +becomes $\vx$ (a column vector) in the result. +:::: + +::: + +::: proof + +**Using @thm-deriv-lincom:** + +Since $\tp{\vx}\vb = \vx \cdot \vb$ (@def-dot-product), +and $\vx$ is constant with respect to $\vb$: + +$$ +\deriv{\vb}(\tp{\vx}\vb) += \deriv{\vb}(\vx \cdot \vb) += \vx +$$ + +by @thm-deriv-lincom. + +::: + +::: proof + +**Using @thm-deriv-matrix-vector:** + +Since $\vx$ is constant with respect to $\vb$, +$A = \tp{\vx}$ is a constant $1 \times p$ matrix. +Applying @thm-deriv-matrix-vector with $\vecf{v} = \vb$ +(so $\deriv{\vb}\vb = \matr{I}$): + +$$ +\ba +\deriv{\vb}(\tp{\vx}\vb) +&= \paren{\deriv{\vb}\vb} \tp{(\tp{\vx})} \\ +&= \matr{I} \cdot \vx \\ +&= \vx +\ea +$$ + +::: + +:::{#exm-deriv-lincom-tp} +#### Derivative of a transpose product + +Let $\vx = \tp{(3, 5)}$ and $\vb = \tp{(\beta_1, \beta_2)}$. +Then $\tp{\vx}\vb = 3\beta_1 + 5\beta_2$, and by @cor-deriv-lincom-tp: + +$$ +\underbrace{\deriv{\vb}\left(\underbrace{\tp{\vx}}_{1 \times 2}\underbrace{\vb}_{2 \times 1}\right)}_{2 \times 1} += \underbrace{\vx}_{2 \times 1} += \begin{pmatrix} 3 \\ +5 \end{pmatrix} +$$ + +::: + --- :::{#thm-quadratic-form} diff --git a/_subfiles/math-prereqs/_thm-deriv-matrix-product-matrix.qmd b/_subfiles/math-prereqs/_thm-deriv-matrix-product-matrix.qmd new file mode 100644 index 0000000000..6cfa6ebd07 --- /dev/null +++ b/_subfiles/math-prereqs/_thm-deriv-matrix-product-matrix.qmd @@ -0,0 +1,86 @@ +:::{#def-matrix-derivative} + +#### Matrix derivative + +For a scalar-valued function $f(\matr{X})$ +of an $m \times n$ matrix $\matr{X}$, +the **matrix derivative** is the $m \times n$ matrix +whose $(i,j)$ entry is the partial derivative of $f$ +with respect to the $(i,j)$ entry of $\matr{X}$: + +$$ +\left[\deriv{\matr{X}} f\right]_{ij} = \deriv{X_{ij}} f +$$ + +::: + +:::{#exm-matrix-derivative} + +Let $\matr{X}$ be a $2 \times 2$ matrix and $f(\matr{X}) = \operatorname{tr}(\matr{X}) = X_{11} + X_{22}$. +Then $\deriv{X_{ij}} f = 1$ if $i = j$ and $0$ otherwise, so: + +$$ +\deriv{\matr{X}} f = \matr{I}_2 +$$ + +::: + +{{< slidebreak >}} + +:::{#thm-deriv-matrix-product-matrix} + +#### Matrix-derivative of a product of matrices + +If $A$ ($r \times m$) and $B$ ($n \times r$) +are constant with respect to the $m \times n$ matrix $\matr{X}$, +then: + +$$ +\underbrace{\deriv{\matr{X}} \operatorname{tr}(A \matr{X} B)}_{m \times n} = +\underbrace{\tp{A}}_{m \times r} +\underbrace{\tp{B}}_{r \times n} +$$ + +:::: notes +The trace makes $\operatorname{tr}(A \matr{X} B)$ a scalar, +so its matrix derivative is again an $m \times n$ matrix. +The derivative of the matrix product $A \matr{X} B$ itself +(without the trace) is a fourth-order tensor, +which is why this result is stated for the scalar $\operatorname{tr}(A \matr{X} B)$. +:::: + +::: + +::: proof + +Write $\operatorname{tr}(A \matr{X} B) = \sum_{a} \sum_{b} \sum_{c} A_{ab} X_{bc} B_{ca}$. +For entry $(i,j)$: + +$$ +\ba +\left[\deriv{\matr{X}} \operatorname{tr}(A \matr{X} B)\right]_{ij} +&= \deriv{X_{ij}} \sum_{a} \sum_{b} \sum_{c} A_{ab} X_{bc} B_{ca} \\ +&= \sum_{a} A_{ai} B_{ja} \\ +&= \sum_{a} \left[\tp{A}\right]_{ia} \left[\tp{B}\right]_{aj} \\ +&= \left[\tp{A}\,\tp{B}\right]_{ij} +\ea +$$ + +::: + +:::{#exm-deriv-matrix-product-matrix} + +Let $A = \matr{I}_2$ ($2 \times 2$) and $B = \begin{pmatrix}2 & 0 \\ +0 & 3\end{pmatrix}$ ($2 \times 2$). +Then $\operatorname{tr}(A \matr{X} B) = 2X_{11} + 3X_{22}$, and: + +$$ +\underbrace{\deriv{\matr{X}} \operatorname{tr}(A \matr{X} B)}_{2 \times 2} += \underbrace{\tp{A}}_{2 \times 2} \underbrace{\tp{B}}_{2 \times 2} += \matr{I}_2 \begin{pmatrix}2 & 0 \\ +0 & 3\end{pmatrix} += \begin{pmatrix}2 & 0 \\ +0 & 3\end{pmatrix} +$$ + +::: diff --git a/_subfiles/math-prereqs/_thm-deriv-matrix-product-vector.qmd b/_subfiles/math-prereqs/_thm-deriv-matrix-product-vector.qmd new file mode 100644 index 0000000000..913267a914 --- /dev/null +++ b/_subfiles/math-prereqs/_thm-deriv-matrix-product-vector.qmd @@ -0,0 +1,58 @@ +:::{#thm-deriv-matrix-product-vector} + +#### Vector-derivative of a product of matrices + +If $A$ ($\ell \times m$) and $B$ ($m \times q$) +are constant with respect to $\vb$, +and $\vecf{v} = \vecf{v}(\vb)$ is a $q \times 1$ vector +that depends on the $p \times 1$ vector $\vb$, +then: + +$$ +\underbrace{\deriv{\vb} (A B \vecf{v})}_{p \times \ell} = +\underbrace{\paren{\deriv{\vb} \vecf{v}}}_{p \times q} +\underbrace{\tp{B}}_{q \times m} +\underbrace{\tp{A}}_{m \times \ell} +$$ + +::: + +::: proof + +Apply @thm-deriv-matrix-vector +with the constant $\ell \times q$ matrix $AB$, +then use $\tp{(AB)} = \tp{B}\,\tp{A}$: + +$$ +\ba +\deriv{\vb} (A B \vecf{v}) +&= \paren{\deriv{\vb} \vecf{v}} \tp{(AB)} \\ +&= \paren{\deriv{\vb} \vecf{v}} \tp{B}\,\tp{A} +\ea +$$ + +::: + +:::{#exm-deriv-matrix-product-vector} + +Let $A = \begin{pmatrix}1 & 0\end{pmatrix}$ ($1 \times 2$), +$B = \begin{pmatrix}2 & 0 \\ +0 & 3\end{pmatrix}$ ($2 \times 2$), +and $\vecf{v}(\vb) = \vb$ where $\vb = \tp{(\beta_1, \beta_2)}$. +Then $AB\vecf{v} = 2\beta_1$, and: + +$$ +\underbrace{\deriv{\vb}(AB\vecf{v})}_{2 \times 1} += \underbrace{\paren{\deriv{\vb}\vb}}_{2 \times 2} +\underbrace{\tp{B}}_{2 \times 2} +\underbrace{\tp{A}}_{2 \times 1} += \matr{I}_2 +\begin{pmatrix}2 & 0 \\ +0 & 3\end{pmatrix} +\begin{pmatrix}1 \\ +0\end{pmatrix} += \begin{pmatrix}2 \\ +0\end{pmatrix} +$$ + +:::