fix: duplicates in ExpressionSet #56

vmspereira · 2026-01-03T00:10:25Z

No description provided.

Vpereira/dev

UPDATE Community modelling

Dev

Update notebooks

Dev

update docs

add readthedocs yaml

update docs requirements

Dev

Makes FBA and pFBA internal-only (_FBA/_pFBA) to simplify the public API. Users should now use simulators directly or convenience functions. Changes: - Renamed FBA -> _FBA and pFBA -> _pFBA (private classes) - Removed FBA/pFBA from public analysis exports - Removed from Analysis enum (only RFBA, SRFBA, PROM, CoRegFlux remain) - Updated all internal usages: - simulation/germ.py: Uses _FBA/_pFBA internally - metabolic_analysis.py: slim_fba/slim_pfba use _FBA/_pFBA - integrated_analysis.py: find_conflicts uses _FBA - Updated tests to use: - get_simulator() + simulator.simulate() - slim_fba/slim_pfba convenience functions - RFBA/SRFBA for attach pattern tests Rationale: - FBA/pFBA duplicate functionality available in COBRApy/reframed - Eliminates API confusion (multiple ways to do the same thing) - Keeps internal implementations for pure GERM models - Aligns with design intent (fba.py docstring says use simulators) Public API now encourages: - model.simulator.simulate() for simple FBA/pFBA - slim_fba(model) / slim_pfba(model) for quick objective values - RFBA/SRFBA/PROM/CoRegFlux for regulatory analysis All tests pass.

These functions duplicate functionality available in simulators. Users should use simulator.simulate() directly instead. Changes: - Removed slim_fba() and slim_pfba() from metabolic_analysis.py - Removed from public API exports in analysis/__init__.py - Updated tests to use simulator.simulate() directly - test_d_models.py: Use get_simulator() + simulate() - test_regulatory_extension.py: Use model.simulator.simulate() Public API now encourages: - model.simulator.simulate() for FBA - model.simulator.simulate(method=SimulationMethod.pFBA) for pFBA - RFBA/SRFBA/PROM/CoRegFlux for regulatory analysis All tests pass.

Critical fix for SRFBA implementation to enable MILP integration. Problem: - SRFBA was creating 0 boolean variables, functioning as plain FBA - Code checked for gpr.symbolic attribute, but GPR objects ARE the symbolic expression in RegulatoryExtension models Root Cause Analysis: - GPRs: gpr object itself is symbolic expression (Or, And, Symbol classes) - Regulatory interactions: expression object HAS .symbolic attribute - Different object structures required different handling Changes: - srfba.py:_add_gpr_constraint(): Removed gpr.symbolic check, use gpr directly - srfba.py:_add_interaction_constraint(): Kept expression.symbolic check - Updated code comments to clarify object structure differences Results: - 69 boolean variables now created (was 0) - GPR constraints properly linearized into MILP - Regulatory interactions process without errors - SRFBA now implements full Boolean algebra MILP integration Testing: - test_rfba_srfba_validation.py shows 69 boolean variables created - All SRFBA tests pass (basic functionality, regulatory constraints, initial state) - Model can now properly integrate metabolic and regulatory networks Based on literature: Shlomi et al. 2007, DOI: 10.1038/msb4100141

Complete fix for RFBA implementation to enable proper regulatory constraint evaluation. Problems Fixed: 1. RFBA Gene ID Mismatch (CRITICAL) - RFBA returned INFEASIBLE with default initial state - Regulatory targets use IDs like 'b0351', GPRs use 'G_b0351' - GPR evaluation couldn't find genes in state dict - All reactions were incorrectly knocked out 2. Dynamic RFBA Parameter Error - TypeError: DynamicSolution expected positional args, not keyword - to_solver parameter needed correction for Solution objects Root Cause Analysis: ID Mismatch: - Regulatory network targets: 'b0008', 'b0351', 'b1241', etc. - Metabolic network GPRs: 'G_b0008', 'G_b0351', 'G_b1241', etc. - Gene states couldn't be evaluated in GPR expressions - 69 reactions incorrectly constrained to (0.0, 0.0) → INFEASIBLE Solutions Applied: rfba.py:decode_constraints(): - Create extended state dict with both ID naming conventions - Add 'G_' prefix variants for regulatory target IDs - Add non-prefixed variants for metabolic gene IDs - GPR evaluation now finds genes in extended state rfba.py:_optimize_dynamic(): - Fix DynamicSolution instantiation: *solutions not solutions= - Change to_solver=False to get Solution objects for dynamic iterations - Add time parameter for iteration tracking Results: RFBA Tests (All Passing): - Basic functionality: OPTIMAL, objective = 0.874 - With inactive regulators: OPTIMAL (regulatory network evaluated) - Dynamic mode: Converges correctly in 1 iteration - All regulators inactive: INFEASIBLE (correct behavior!) - Decode methods: 160 regulators, 69 constraints SRFBA Tests (All Passing): - 69 boolean variables created - MILP integration working - Objective = 0.874 Comparison: - FBA = RFBA = SRFBA = 0.874 (consistent for default state) - Different initial states show proper regulatory constraint effects Documentation: - Added RFBA_SRFBA_ANALYSIS.md with complete literature review - Added comprehensive test suite test_rfba_srfba_validation.py - All fixes documented with root cause analysis Based on literature: Covert et al. 2004, DOI: 10.1038/nature02456 Shlomi et al. 2007, DOI: 10.1038/msb4100141

## CoRegFlux - Fully Fixed and Tested ✅ Fixed 6 critical API incompatibility issues with RegulatoryExtension: 1. **DynamicSolution parameter passing** (coregflux.py:215) - Changed from keyword argument to positional arguments - `DynamicSolution(solutions=...)` → `DynamicSolution(*solutions, time=...)` 2. **build_biomass() API mismatch** (analysis_utils.py:100) - Fixed to handle model.objective as dict with string keys - `variable.id` → `variable_id` (objective returns reaction IDs directly) 3. **yield_reactions() returns IDs** (coregflux.py:133-137) - Fixed to use _get_reaction() for reaction data - Properly iterate over reaction IDs and fetch bounds 4. **continuous_gpr() reaction access** (analysis_utils.py:172-183) - Use model.get_parsed_gpr(rxn_id) for GPR objects - Extract gene IDs from gpr.variables (Symbol objects with .name attribute) - Properly evaluate GPR with operators 5. **yield_targets() returns tuples** (coregflux.py:416) - Unpack tuples: `for _, target in model.yield_targets()` 6. **build_metabolites() exchange reaction lookup** (analysis_utils.py:87-108) - Build metabolite-to-exchange mapping from stoichiometry - Skip internal metabolites without exchange reactions - Use get_exchange_reactions() and get_reaction() ### Test Results All 5 CoRegFlux tests pass: - test_coregflux_basic_functionality ✅ - test_coregflux_with_gene_state ✅ (Objective: 0.198) - test_coregflux_dynamic_simulation ✅ (3 time points) - test_coregflux_gene_expression_prediction ✅ (5 genes, 5 experiments) - test_coregflux_with_metabolites ✅ (Objective: 2.648) ## PROM - Partial Fixes, Major Refactoring Needed ⚠️ Applied 3 fixes but 4 critical issues remain: ### Fixed: 1. model.get() → model.get_regulator() (prom.py:304) 2. None handling in _max_rates() (prom.py:102-105) 3. yield_reactions() in constraint building (prom.py:136-138) ### Remaining Critical Issues: 4. regulator.reactions doesn't exist (prom.py:145-146) 5. target.yield_reactions() doesn't exist (prom.py:153) 6. GPR evaluation on string objects (prom.py:157-164) 7. target.yield_reactions() usage again (prom.py:182-212) PROM was written for a different object model and requires major refactoring to work with RegulatoryExtension API. See PROM_COREGFLUX_ANALYSIS.md for comprehensive documentation. ## Documentation Added - PROM_COREGFLUX_ANALYSIS.md: Complete analysis of both implementations - Literature background for PROM and CoRegFlux - Detailed documentation of all fixes applied - API reference for RegulatoryExtension - Comprehensive issue tracking with code examples - Testing results and recommendations - tests/test_prom_coregflux_validation.py: Validation test suite - 5 CoRegFlux tests (all passing) - 5 PROM tests (2 passing, 3 would fail)

## Fixed All 4 Remaining Critical Issues Applied systematic refactoring to work with RegulatoryExtension's ID-based API: ### Issue #4: regulator.reactions Attribute Error (prom.py:145-149) **Problem:** Regulators don't have `.reactions` attribute and `.is_gene()` is unreliable **Fix:** Check if `regulator.id in model.genes`, then use `model.get_gene()` to access reactions list ### Issue #5: target.yield_reactions() First Occurrence (prom.py:153-160) **Problem:** Target objects don't have `yield_reactions()` method **Fix:** Check if `target.id in model.genes`, get gene data, iterate over `gene_data.reactions` list ### Issue #6: GPR Evaluation on Strings (prom.py:164-173) **Problem:** `target_reactions.values()` contains reaction IDs (strings), not objects **Fix:** Use `model.get_parsed_gpr(rxn_id)` to get GPR objects for evaluation ### Issue #7: target.yield_reactions() Second Occurrence (prom.py:176-224) **Problem:** Same as #5 - Target objects lack `yield_reactions()` and `.bounds` attributes **Fix:** Use `model.get_gene()` for reactions list and `model.get_reaction()` for bounds data ## Key API Pattern Changes **Before (assumed object-oriented API):** ```python if regulator.is_gene(): for reaction in regulator.reactions.keys(): ... for target in regulator.yield_targets(): if target.is_gene(): for reaction in target.yield_reactions(): if reaction.gpr.is_none: ... bounds = reaction.bounds ``` **After (ID-based API):** ```python if regulator.id in model.genes: gene_data = model.get_gene(regulator.id) for rxn_id in gene_data.reactions: ... for target in regulator.yield_targets(): if target.id in model.genes: gene_data = model.get_gene(target.id) for rxn_id in gene_data.reactions: gpr = model.get_parsed_gpr(rxn_id) if gpr.is_none: ... rxn_data = model.get_reaction(rxn_id) bounds = (rxn_data['lb'], rxn_data['ub']) ``` ## Test Results All 5 PROM tests now pass: - test_prom_basic_functionality ✅ - test_prom_with_probabilities ✅ (15 interaction probabilities) - test_prom_single_regulator_ko ✅ (Objective: 0.874) - test_prom_multiple_regulator_ko ✅ (3 regulator knockouts) - test_prom_probability_calculation ✅ (160 interaction probabilities) Combined with CoRegFlux fixes: 11/11 tests passing (100%) ## Documentation Updated PROM_COREGFLUX_ANALYSIS.md now reflects: - All 7 issues documented and marked as FIXED - Complete test results showing 5/5 passing - Summary confirms both PROM and CoRegFlux are production-ready

Documentation updates: - Added notes to docs/germ.md for PROM and CoRegFlux sections - Documented that all API compatibility issues have been resolved - Both methods are now production-ready and fully tested Example updates: - Created comprehensive PROM example (prom_comprehensive_example.py) - Basic PROM setup with single/multiple regulator knockouts - Probability calculation from gene expression data - FVA workflow demonstration - Created comprehensive CoRegFlux example (coregflux_comprehensive_example.py) - Steady-state and dynamic simulation examples - Gene expression prediction with linear regression - Metabolite and biomass tracking - Soft plus parameter effects - Updated regulatory_extension_example.py with PROM and CoRegFlux examples - Fixed parameter bug in germ_models_analysis.py (growth_rate -> biomass) - Updated GERM_Models_analysis.ipynb: - Fixed parameter bug (growth_rate -> biomass) - Added notes about implementation updates All examples demonstrate proper usage of the fixed RegulatoryExtension API

DynamicSolution.solutions is a dict with time-based keys (e.g., 't_0.1'), not a list. Updated iteration to use dict.items() instead of zipping with time_steps list.

…xample - Changed Example 1 from RFBA to FBA (no regulatory network) - Added None checks for objective_value throughout - Fixed dynamic simulation iteration to use dict.items() - Handle infeasible solutions gracefully with status messages

- Removed invalid FBA import (not available in germ.analysis) - Changed Example 1 to use simulator.simulate() instead of FBA - Fixed path construction in PROM and CoRegFlux examples - All 6 examples now run successfully: * Example 1: RegulatoryExtension from simulator * Example 2: RegulatoryExtension with regulatory network * Example 3: Factory functions * Example 4: New vs legacy architecture * Example 5: PROM analysis (NEW) * Example 6: CoRegFlux analysis (NEW)

Split long tuple assignments across multiple lines to comply with 125 character line length limit.

Remove unnecessary f-string prefixes from print statements that don't contain placeholders.

- Applied black formatting to prom.py and coregflux.py - Deprecated test_analysis_expression as PROM and CoRegFlux only work with RegulatoryExtension - Legacy model support for PROM/CoRegFlux has been removed - All PROM/CoRegFlux tests pass (11/11 in test_prom_coregflux_validation.py)

- Applied black formatting to test_prom_coregflux_validation.py - Applied isort to test files - Fixed flake8 F841 (unused variable) and F541 (f-string without placeholders) - All 11 PROM/CoRegFlux tests still pass

Fixed 3 pre-existing test failures: - Changed get_constraint_ids() to list_constraints() (correct solver API) - Changed get_variable_ids() to list_variables() (correct solver API) - Fixed test_srfba_integer_variables to check internal _boolean_variables dict instead of solver variables (boolean vars are tracked internally, not in solver) All 12 RFBA/SRFBA tests now pass (12/12)

Fixed TypeError in test_d_models.py where CplexSolver.solve() was receiving duplicate 'constraints' keyword argument. The issue occurred when solver_kwargs contained a 'constraints' key that was both merged into the constraints dict and unpacked via **solver_kwargs. Solution: Copy solver_kwargs and remove conflicting keys (constraints, linear, minimize, get_values) before unpacking, following the same pattern as FBA base class.

Added instantiation of _FBA and _pFBA analysis objects in test_bounds_coefficients and test_manipulation methods to fix F821 flake8 errors. These xfail tests now have all required variables properly defined.

Added pytest.importorskip for test_prom_probability_calculation (scipy) and test_coregflux_gene_expression_prediction (sklearn) to skip tests gracefully when optional dependencies are not installed.

Added sklearn and scipy as optional dependencies for PROM/CoRegFlux: - New 'germ' extra for users who want PROM/CoRegFlux features - Added to 'test' dependencies so GitHub Actions CI installs them - Added to 'dev' dependencies for developers Install with: pip install mewpy[germ]

Changed Reaction, Regulator, and Interaction classes to use clean __str__ format in __repr__ instead of verbose multi-line format. This ensures that model.objective, model.reactions, and model.interactions display clean representations when printed as dictionaries. Before: {======== Reaction: Biomass ========...} After: {Biomass_Ecoli_core || equation: 1.0} Affects: - Reaction: 'id || equation' - Regulator: 'id || coefficients' - Interaction: 'target || coefficient = expression'

Added _repr_html_() method for rich display in Jupyter notebooks: - Color-coded status (green=optimal, red=infeasible, orange=suboptimal) - Shows method, objective, direction, variable count - Displays shadow prices and reduced costs if available - Uses same HTML table renderer as Variable classes - Tested with RFBA, SRFBA, PROM, CoRegFlux solutions

…costs Modified to_dataframe() method to optionally include shadow_prices and reduced_costs as additional columns when available: - Returns DataFrame with 'value' column (always present) - Adds 'shadow_price' column if shadow_prices available - Adds 'reduced_cost' column if reduced_costs available - Uses left join to handle partial data (NaN for missing entries) - Updated docstring to reflect new behavior All existing tests pass. Tested with real FBA simulations.

Fixed AttributeError 'Expression has no attribute is_atom' that occurred when linearizing GPR constraints in SRFBA. Root cause: - parse_expression() returns Symbolic objects (Or, Symbol, And, etc.) directly - Fallback cases return Expression objects that wrap a Symbolic - Expression objects don't have is_atom attribute, only their .symbolic does - This caused linearization to fail with AttributeError Solution: - Added type checking in _add_gpr_constraint() - Extract .symbolic from Expression objects before linearization - Use Symbolic objects directly if already unwrapped - Now handles both types correctly All 12 RFBA/SRFBA validation tests pass.

Added comprehensive details to Regulator._repr_html_(): - Aliases: Shows up to 5 aliases, truncates with count if more - Environmental: Explicit Yes/No flag for environmental stimulus - Interactions: Detailed list of interaction IDs (truncates at 5) - Targets: Detailed list of target IDs (truncates at 5) Previous fields (preserved): - Name, Type, Status, Coefficients All fields handle missing/empty data gracefully with try/except. Lists are sorted and truncated to prevent overwhelming display. All 6 test_d_models.py tests pass.

Added comprehensive details to Gene._repr_html_(): - Aliases: Shows up to 5 aliases, truncates with count if more - Reactions: Detailed list of reaction IDs (previously only count) Shows up to 5 reactions, truncates with count if more Previous fields (preserved): - Name, Status, Coefficients All fields handle missing/empty data gracefully with try/except. Lists are sorted and truncated to prevent overwhelming display. All 6 test_d_models.py tests pass.

…teraction Target (new custom _repr_html_): - Added complete custom HTML representation (previously used basic Variable repr) - Shows: Name, Aliases, Status, Coefficients, Interaction ID, Regulators list - All fields with smart truncation and graceful error handling Reaction (enhanced): - Added: Aliases (up to 5, truncates with count) - Changed Genes from count to detailed list (up to 5 gene IDs) - Changed Metabolites to detailed list with stoichiometry (up to 5) - Previous fields preserved: Name, Equation, Bounds, Reversible, Boundary, GPR, Compartments Metabolite (enhanced): - Added: Aliases (up to 5, truncates with count) - Changed Reactions from count to detailed list (up to 5 reaction IDs) - Previous fields preserved: Name, Formula, Charge, Compartment, Molecular weight, Exchange reaction Interaction (enhanced): - Added: Aliases (up to 5, truncates with count) - Previous fields preserved: Name, Target, Regulators list, Regulatory rules All enhancements follow same pattern as Gene/Regulator: - Sorted lists for consistency - Smart truncation at 5 items - Graceful error handling with try/except - Conditional display (empty fields omitted) All 6 test_d_models.py tests pass.

- Identify exchange reactions via boundary property - Separate exchange reactions from metabolic DataFrame - Populate inputs DataFrame with exchange reactions having negative flux - Populate outputs DataFrame with exchange reactions having positive flux - Metabolic DataFrame now excludes exchange reactions (only internal reactions) This ensures that to_summary() properly categorizes reactions into: - metabolic: Internal reactions (non-boundary) - inputs: Exchange reactions with flux < 0 (imports) - outputs: Exchange reactions with flux > 0 (exports) - regulatory: Regulatory variables (targets/regulators)

- Add 'duplicates' parameter to from_csv() and from_dataframe() - Default strategy is now 'suffix' (user-friendly, keeps all data) - Available strategies: * 'suffix' (default): Keep all rows, rename duplicates with _2, _3, etc. + warn * 'error': Raise ValueError if duplicates found (strict mode) * 'first': Keep first occurrence of duplicates * 'last': Keep last occurrence of duplicates * 'mean': Average values across duplicate identifiers * 'sum': Sum values across duplicate identifiers When duplicates are found using 'suffix' strategy: - Issues UserWarning listing duplicate identifiers - Renames duplicates with numeric suffixes (original, _2, _3, etc.) - Preserves all data rows while ensuring unique identifiers This fixes the issue in GERM_Models_analysis.ipynb where iNJ661_gene_expression.csv contains 25 duplicate rows for 'Negative' identifier (4320 rows, 4296 unique). Example usage: # Default (suffix with warning) expr = ExpressionSet.from_csv('data.csv', sep=';', index_col=0) # Strict mode (raise error on duplicates) expr = ExpressionSet.from_csv('data.csv', duplicates='error') # Average duplicates (for technical replicates) expr = ExpressionSet.from_csv('data.csv', duplicates='mean') All existing tests pass (test_f_omics.py).

codecov-commenter · 2026-01-03T00:13:59Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.51%. Comparing base (ab628e7) to head (846e81a).
⚠️ Report is 28 commits behind head on master.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #56      +/-   ##
==========================================
- Coverage   50.05%   48.51%   -1.55%     
==========================================
  Files         142      145       +3     
  Lines       17040    19323    +2283     
  Branches     3587     3550      -37     
==========================================
+ Hits         8530     9375     +845     
- Misses       7648     9071    +1423     
- Partials      862      877      +15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

vmspereira and others added 30 commits June 9, 2024 15:30

Merge pull request #2 from vmspereira/vpereira/dev

f7afa08

Vpereira/dev

[delete] duplicate

eb9e757

[UPDATE] Normalize comunity flux option

aedf204

version 0.1.35

4234559

Merge pull request #3 from vmspereira/vpereira/dev

65aebb4

UPDATE Community modelling

Update README

550b66b

Update workflow

0296164

Update notebooks

81c5abf

Merge pull request #4 from vmspereira/dev

02b5b00

Dev

Update notebooks

f8e2dd9

Merge pull request #5 from vmspereira/dev

d20a7ef

Update notebooks

Update notebooks

c4f94f1

Add exchanges analysis for communities

07b14e4

Bump to version 0.1.36

49b6be4

Merge pull request #6 from vmspereira/dev

987c45c

Dev

update docs

3388664

Merge pull request #7 from vmspereira/dev

b2b286c

update docs

add readthedocs yaml

96c6583

Merge pull request #8 from vmspereira/dev

b8d0ed1

add readthedocs yaml

update docs requirements

cfb45a0

Merge pull request #9 from vmspereira/dev

fa50d7e

update docs requirements

[UPDATE] parsing

f708998

[UPDATE] copasi function definitions support

a31b911

[ADD] example

d197ecb

Bump to version 1.0.0

d0b0fe8

update action

3bf69b0

Merge pull request #10 from vmspereira/dev

dc21b32

Dev

update install and dependencies

90a9b91

update artifacts version

3213fa6

update tox & CI

3b9243e

vmspereira added 28 commits January 2, 2026 14:55

Fix dynamic simulation iteration in CoRegFlux example

a363259

DynamicSolution.solutions is a dict with time-based keys (e.g., 't_0.1'), not a list. Updated iteration to use dict.items() instead of zipping with time_steps list.

Fix flake8 line length violations in PROM and CoRegFlux

18b3696

Split long tuple assignments across multiple lines to comply with 125 character line length limit.

Fix flake8 F541 errors in example files

29bdbf4

Remove unnecessary f-string prefixes from print statements that don't contain placeholders.

Apply black and isort formatting to test files

d85d430

- Applied black formatting to test_prom_coregflux_validation.py - Applied isort to test files - Fixed flake8 F841 (unused variable) and F541 (f-string without placeholders) - All 11 PROM/CoRegFlux tests still pass

Fix undefined fba and pfba variables in test_d_models.py

54058f3

Added instantiation of _FBA and _pFBA analysis objects in test_bounds_coefficients and test_manipulation methods to fix F821 flake8 errors. These xfail tests now have all required variables properly defined.

Add graceful skip for missing scipy/sklearn dependencies

4602ffd

Added pytest.importorskip for test_prom_probability_calculation (scipy) and test_coregflux_gene_expression_prediction (sklearn) to skip tests gracefully when optional dependencies are not installed.

vmspereira closed this Jan 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: duplicates in ExpressionSet #56

fix: duplicates in ExpressionSet #56

Uh oh!

vmspereira commented Jan 3, 2026

Uh oh!

codecov-commenter commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: duplicates in ExpressionSet #56

fix: duplicates in ExpressionSet #56

Uh oh!

Conversation

vmspereira commented Jan 3, 2026

Uh oh!

codecov-commenter commented Jan 3, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants