335 allow omitting of nan values in t tests#393
Conversation
modified: backend/tests/protzilla/data_integration/test_enrichment_analysis.py
|
Just as information #391 completely changes the way the t-test is calculated. Therefore the stats.ttest_ind() function is no longer used and has been replaced with a manual vectorized implementation. Therefore, these two PRs are incompatible. |
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||
- Replace old scipy loop-based implementation with PR 391's vectorized version
- Add nan_policy parameter with conditional dropna/pivot logic:
* 'raise' → ValueError if any NaN found in intensity column
* 'omit' → dropna before pivot (NaN rows excluded from all stats + fold change)
* 'propagate' → keep NaN; propagate into statistics so proteins are excluded
- Fix fold change NaN regression: with 'omit', median is now computed from clean
data (post-dropna), so fold change is a valid number (not NaN as in old impl)
- Update tests accordingly: remove 'fold change is NaN under omit' assertions
…nto 335-allow-omitting-of-nan-values-in-t-tests Co-authored-by: Copilot <copilot@github.com>
|
Ok, branch 335-allow-omitting-of-nan-values-in-t-tests (allow omitting NaN values in t-tests) was originally based on dev, This branch 335-allow-omitting-of-nan-values-in-t-tests must merge into 339-multithreading-for-t-test to adapt the nan_policy feature to work with the vectorized pipeline |
I'm so sorry that I did not see this problem coming 🤦♂️ |
Description
fixes #335
This pull request introduces configurable handling of NaN (missing) values in the differential expression t-test analysis, allowing users to specify how NaNs are treated during statistical testing.
Changes
Before: NaN values are always omitted during t-tests.
Now: During t-tests NaN values are handled one of three ways:
"Propagate","Omit", and"Raise"see here for explanations.Updated the
t_testfunction to accept anan_policyparameter and pass it toscipy.stats.ttest_ind, enabling the user to choose between propagating, omitting, or raising errors on NaNs. [1] [2] [3]User interface and API changes:
DropdownFieldfornan_policyselection in the data analysis form, defaulting to"Raise". [1] [2]Testing and documentation:
test_differential_expression.pyto cover allnan_policybehaviors (propagate,omit,raise), including edge cases where NaNs result in exclusion, valid results, or errors. Tests also clarify the distinction between p-value computation and fold change calculation when NaNs are present. [1] [2] [3] [4]Testing
python -m pytest tests/protzilla/data_analysis/test_differential_expression.py -vPR checklist
Development
Mergeability
blackpnpm formatand checked withpnpm lintCode review