Skip to content

Non-Deterministic results PreProcess.reduce(extend=True) due to internal MCMC sampling #136

@PalashMendhe

Description

@PalashMendhe

Describe the bug
When running PreProcess.reduce(extend=True) on the same model multiple times, the function returns different sets of removed reactions. This non-determinism prevents reproducibility for downstream analysis.

To Reproduce
Steps to reproduce the behavior:

  1. Go to your root directory.
  2. Create a test file.
  3. here is a minimal script using e_coli_core.json to recreate the error
import numpy as np
import cobra.io
from dingo.preprocess import PreProcess

def test_extend_nondeterminism():
    ecoli = cobra.io.load_json_model("ext_data/e_coli_core.json")
    np.random.seed(42) 
    preprocessor_A = PreProcess(ecoli.copy(), tol=1e-6, verbose=False)
    removed_A, _ = preprocessor_A.reduce(extend=True)
    
    np.random.seed(42) 
    preprocessor_B = PreProcess(ecoli.copy(), tol=1e-6, verbose=False)
    removed_B, _ = preprocessor_B.reduce(extend=True)

    set_A, set_B = frozenset(removed_A), frozenset(removed_B)
    if set_A != set_B:
        print(f"Reactions removed ONLY in Run A: {sorted(set_A - set_B)}")
        print(f"Reactions removed ONLY in Run B: {sorted(set_B - set_A)}")
if __name__ == "__main__":
    test_extend_nondeterminism()

Expected behavior
Given the same metabolic model and the same algorithm parameters, reduce(extend=True) should always return the same set of removed reactions.

Screenshots

Image Image

Proposed fix
Add an optional steady_states parameter to reduce().
When steady_states is provided by the caller, the internal sampling step is skipped entirely and the provided matrix is used directly for correlation estimation. When steady_states=None (the default), the current internal sampling behaviour is preserved as a fallback for convenience, but a UserWarning is emitted so the user is informed that results will not be reproducible.

Desktop

  • OS: WSL / Ubuntu
  • Browser : chrome

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions