Skip to content

Latest commit

 

History

History
104 lines (57 loc) · 3.07 KB

File metadata and controls

104 lines (57 loc) · 3.07 KB

Example

Here we present an example of using the BiasedUrn package to perform permutation sampling of case-control data that adjusts for covariates. The set of case-control data used in this example can be downloaded here.

We begin by loading the BiasedUrn package, setting the permutation counts, then reading the data file containing disease and covariate data. For this example, there are 600 subjects and we wish to perform 1000 permutations.

library("BiasedUrn")

n.subjects <- 600
n.permutations <- 1000

# the file 'test.dat' contains the case-control subject data:
datamat <- matrix(scan("test.dat"), nrow = n.subjects, byrow = T)

For the subject data in this example, the first column is the case-control status while the remaining columns denote the covariate data. We assign these subarrays below.

dis <- datamat[, 1]
n.cases <- sum(dis)  # number of case subjects
cov <- datamat[, -c(1)]

The first step to permutation sampling is to fit a logistic-regression model; using the notation in Epstein, et al., 2012, this is

eqn-1

model <- glm(dis ~ cov, family = binomial())

The maximum-likelihood estimates of these parameters are then used to construct the estimated disease odds for the j-th subject, i.e.,

eqn-2

disease.odds <- exp(model$linear.predictors)

Finally, we generate an n.subjects X n.permutations matrix of permuted datasets. Note that each column (rather than each row) of the resulting matrix corresponds to a permuted dataset that can then be used to establish significance of a case-control test.

m1 <- c(rep(1, length(dis)))
perm.hg <- rMFNCHypergeo(n.permutations, m1, n.cases, disease.odds)

Epstein software | Human Genetics | School of Medicine | Emory University