-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME.Rmd
More file actions
212 lines (161 loc) · 6.2 KB
/
README.Rmd
File metadata and controls
212 lines (161 loc) · 6.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
set.seed(420)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
dpi = 300,
fig.path = "man/figures/",
fig.width = 9,
fig.height = 6,
out.width = "70%"
)
```
# NLJ
The **nlj** package—named in honor of British statistician Norman Lloyd
Johnson—provides specialized tools for working with the unbounded Johnson SU
distribution. In addition to functions for density evaluation, distribution and
quantile computation, and random variate generation, the package offers
automatic normalization routines and a generalized transformation-based
regression model that leverages the inverse hyperbolic sine transformation.
---
## Installation
Install the development version of **nlj** from GitHub using **devtools**:
``` r
# install.packages("devtools")
devtools::install_github("shanedrabing/nlj")
```
---
## Overview of Functions and Examples
**nlj** provides a suite of tools for the Johnson SU distribution, automatic
data normalization, and advanced regression modeling. The following examples
demonstrate the package’s core functionality.
---
### Johnson-SU Distribution Functions
The Johnson-SU distribution functions—**djohnson**, **pjohnson**, **qjohnson**,
and **rjohnson**—provide density evaluation, cumulative distribution
computation, quantile inversion, and random sample generation. They support the
full Johnson-SU parameterization for flexible control of skewness and kurtosis.
#### Density
**djohnson** computes the Johnson-SU density for a specified set of input
values.
```{r djohnson}
x <- seq(-pi, pi, length.out = 300)
d <- nlj::djohnson(x, gamma = -1.7, delta = 2.1, xi = -1.45, lambda = 1.45)
plot(x, d, type = "l",
main = "Johnson-SU Density",
xlab = "X", ylab = "Density")
```
#### Cumulative Distribution
**pjohnson** returns the cumulative distribution function (CDF) values for
specified quantiles under the Johnson-SU model.
```{r pjohnson}
q <- seq(-pi, pi, length.out = 300)
p <- nlj::pjohnson(q, gamma = -1.7, delta = 2.1, xi = -1.45, lambda = 1.45)
plot(q, p, type = "l",
main = "Johnson-SU Cumulative Distribution",
xlab = "Quantile", ylab = "Probability")
```
#### Quantile Function
**qjohnson** obtains the quantile (inverse CDF) corresponding to specified
probability levels, facilitating probability-based analyses within the
Johnson-SU framework.
```{r qjohnson}
p <- seq(0, 1, length.out = 300)
q <- nlj::qjohnson(p, gamma = -1.7, delta = 2.1, xi = -1.45, lambda = 1.45)
plot(p, q, type = "l",
main = "Johnson-SU Quantile",
xlab = "Probability", ylab = "Quantile")
```
#### Random Deviate Generation
**rjohnson** generates random samples from the Johnson-SU distribution, useful
for simulation, bootstrapping, or Monte Carlo studies.
```{r rjohnson}
nlj::rjohnson(5, gamma = -1.7, delta = 2.1, xi = -1.45, lambda = 1.45)
```
---
### Automatic Normalization Functions
The package provides **znorm** and **zjohnson**, automated normalization
functions that compute and apply data transformations, track transformation
parameters, and support denormalization. These utilities streamline the process
of preparing data in standardized or distribution-adapted form.
#### Johnson-SU Normalization
**zjohnson** performs Johnson-SU normalization by estimating distribution
parameters that align the input data to a standard normal reference. It adapts
to skewness and heavy tails, producing a transformation that accommodates
non-Gaussian data characteristics.
```{r johnson-density}
# Example data: Wind speed
x <- sort(airquality$Wind)
d <- density(x)
# Apply Johnson-SU normalization
zj <- nlj::zjohnson(x)
# Extract the density estimates
xd <- d$x
yd <- d$y
yj <- zj$fd(xd)
# Plot the density comparison
plot(range(xd), range(c(yd, yj)), type = "n",
main = "Wind Speed Density Estimation",
xlab = "Wind Speed", ylab = "Density")
lines(xd, yd, col = "black") # Original density
lines(xd, yj, col = "red") # Johnson-SU normalized density
legend(min(xd), max(c(yd, yj)),
c("Kernel", "Johnson-SU"),
col = c("black", "red"),
lwd = 1, bg = "white")
```
---
### Generalized Asinh Transformation Model (GATM)
**lm.gat** implements the Generalized Asinh Transformation Model, fitting
regression models with a parameterized inverse hyperbolic sine transform on
predictors. This approach can enhance model fit in the presence of
nonlinearity, particularly when linear models are insufficient.
#### Automatic Relationship Optimization
The following example illustrates **lm.gat** in action. We predict miles per
gallon (**mpg**) from engine horsepower (**hp**) using both a standard linear
model and a GATM model. The comparison highlights how the GATM transformation
captures non-linear patterns that a simple linear fit does not.
```{r gatm}
# Example data: Horsepower and miles per gallon (MPG)
i <- order(mtcars$hp)
x <- mtcars$hp[i]
y <- mtcars$mpg[i]
# Fit simple regression
m <- lm(y ~ x)
# Fit non-linear model
gat <- nlj::lm.gat(y ~ x, iterations = 3)
# Plot
plot(x, y, pch = 16,
main = "Horsepower (HP) vs Miles Per Gallon (MPG)",
xlab = "HP", ylab = "MPG")
lines(x, m$fitted.values, col = "red")
lines(x, gat$z, col = "blue")
legend(min(x), max(y),
c("Simple", "GATM"),
col = c("red", "blue"),
lwd = 1, bg = "white")
```
#### Complex Nonlinear Interaction Optimization
The next example extends **lm.gat** to a model with multiple interaction terms,
demonstrating its ability to manage high-dimensional, non-linear relationships.
We illustrate this by modeling quarter-mile time (**qsec**) as a function of
**hp**, **mpg**, and **wt**, including all two-way and three-way interactions.
First, we fit a standard linear model with these terms and examine its summary.
Then, we apply **lm.gat** to the same formula, showcasing how the parameterized
asinh transformation can yield a superior fit by accommodating complex variable
interactions.
```{r interaction-simple}
# Fit simple regression
m <- lm(qsec ~ hp * mpg * wt, mtcars)
summary(m)
```
```{r interaction-complex}
# Fit complex regression
gat <- nlj::lm.gat(qsec ~ hp * mpg * wt, mtcars,
iterations = 6, penalty = 1e-9)
summary(gat$fit)
```