Skip to content

Question regarding the role of xim in overdispersion theta estimation #75

Description

@Truongphi20

Context

Hi,

I am using glmGamPoi to model single-cell RNA-seq count data using the Negative Binomial distribution. The workflow runs successfully, but I have a question regarding the internal implementation details of overdispersion estimation.

In the paper supplementary, the quadratic variance-to-mean relationship is defined as:

$$\sigma^2 = \mu + \theta \mu^2$$

However, while digging into the codebase for estimating $\theta$, I noticed the introduction of a variable named xim. When running tests to check the estimated value of $\theta$ without factoring in xim, the output does not seem to show a significant difference.

Questions

  1. What is the explicit mathematical or computational role of the xim variable during the estimation of $\theta$?
  2. Why it doesn't strictly follow the standard variance-mean function above? (e.g., Is it a stabilization parameter, a transformation step, or handling a specific edge case for zero-inflation/low counts)

Minimal Code Context

The command I used as:

fit <- glmGamPoi::glm_gp(
  data         = umi,
  design       = ~1,
  col_data     = data,
  offset       = log_umi,
  size_factors = FALSE
)

I would love to understand the underlying intuition behind this design choice. Thank you for developing such a fantastic and high-performance package!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions