minkiPy

minkiPy is a Python package for differential analysis of gene spatial organisation in spatial transcriptomics data, using Minkowski functionals and tensors.

This repository accompanies the paper "Differential Analysis of Gene Spatial Organisation with Minkowski Functionals and Tensors" and includes:

the minkiPy package,
a command-line interface,
an exploratory notebook to get started quickly on your own data,
full workflow notebooks used for end-to-end analyses.

Input format

minkiPy expects a pandas.DataFrame with transcript-level coordinates and these columns:

gene
global_x
global_y

import pandas as pd

transcripts_df = pd.DataFrame({
    "gene": [...],
    "global_x": [...],
    "global_y": [...],
})

Notes:

gene is a string identifier.
global_x and global_y should share the same coordinate system (usually micrometres).
Converting platform-specific files to this format is done upstream.

Method summary

For each gene, minkiPy reconstructs a spatial density field and computes a profile across level sets.

Each profile contains:

W0 (area),
W1 (boundary length),
W2 (Euler-characteristic-related term),
beta (anisotropy index from a Minkowski tensor).

Profiles are shaped (4, LS) per gene.

Optional Monte Carlo runs estimate covariance. Distances can then be covariance-aware Gaussian 2-Wasserstein, or Euclidean for fast exploration.

These profiles are the starting point for downstream analysis: sample and gene comparisons, condition-level ranking of spatial reorganisation, and embedding/graph analyses.

System requirements

Software dependencies

minkiPy requires Python >=3.10. It runs on CPU and does not require a GPU or any other accelerator.

The recommended installation command, pip install minkipy-st, installs the required Python dependencies automatically. The complete dependency list is kept in pyproject.toml, and the optional reproducible notebook environment is described in minkiPy_env.yaml.

Because minkiPy can run computations in parallel with mpi4py, an MPI runtime such as Open MPI is required for MPI execution; see the installation section below for the short MPI check and platform-specific install commands.

Operating systems tested

The software has been tested on the following operating systems:

Ubuntu 22.04.5 LTS
macOS Ventura 13.7.8 and Tahoe 26
Windows 11 2025

Hardware requirements

No non-standard hardware is required. A normal CPU-only desktop or laptop computer is sufficient for installation and small tests. Runtime and memory use scale with the number of transcripts, number of genes, image resolution, and whether Monte Carlo covariance estimation is enabled. For the exploratory notebook demo, allow enough disk space for the downloaded archive and extracted data; the raw archive is approximately 10 GB before extraction.

Installation

Typical installation time on a normal desktop computer is 5-30 minutes. The pip installation itself is usually fast; most variability comes from installing or configuring MPI and Python environments.

mpi4py needs an MPI runtime (mpirun/mpiexec) installed on your machine.

Before choosing an option:

Option A (pip from PyPI) does not require cloning this repository.
Options B/C (YAML or local development) require a local clone first:

git clone https://github.com/BAUDOTlab/minkiPy.git
cd minkiPy

Option A (recommended): pip

Check MPI:

mpirun --version

If missing, install MPI first:

Ubuntu/Debian

sudo apt update
sudo apt install -y openmpi-bin libopenmpi-dev

macOS (Homebrew)
```
brew install open-mpi
```

Conda-only

conda install -c conda-forge openmpi mpi4py

Update pip tooling:

python -m pip install --upgrade pip setuptools wheel

Install:

pip install minkipy-st

Verify:

python -c "import minkiPy; print('minkiPy import OK')"
python -m minkiPy --help

Option B: Conda environment from YAML

Use this option from the repository root (after git clone and cd minkiPy).

Update Conda first:

conda update -n base -c defaults conda

Create the environment:

conda env create -f minkiPy_env.yaml

Activate it:

conda activate minkiPy

Install package from source (editable):

pip install -e .

(Optional) Add a Jupyter kernel:

python -m ipykernel install --user --name minkiPy --display-name "Python (minkiPy)"

Option C: Local development install

Use this option from the repository root (after git clone and cd minkiPy).

python -m pip install --upgrade pip setuptools wheel
pip install -e .

Troubleshooting

If installation fails:

Retry after updating pip tooling:

python -m pip install --upgrade pip setuptools wheel

For Conda setups, also update Conda:

conda update -n base -c defaults conda

Create a clean virtual environment and reinstall:

python -m venv .venv
source .venv/bin/activate   # Windows (PowerShell): .venv\Scripts\Activate.ps1
python -m pip install --upgrade pip setuptools wheel
pip install minkipy-st

If MPI errors persist, re-check mpirun --version and ensure MPI + mpi4py are compatible.

Demo

The recommended demo is the exploratory notebook: minkiPy_exploratory_workflow.ipynb.

This notebook provides an end-to-end exploratory workflow that:

downloads the FSHD raw dataset from Zenodo,
extracts the data into examples/FSHD_dataset/raw_data/,
preprocesses the MERFISH transcript files and selected center-region masks,
computes Minkowski profiles with n_cov_samples=0 for a fast no-covariance run,
loads the merged HDF5 outputs with minkiPy.process_data,
computes Euclidean downstream distances, and
generates exploratory plots and summary CSV files.

Expected demo output

Expected intermediate and final outputs include:

downloaded data at examples/FSHD_dataset/raw_data.zip, followed by extracted files under examples/FSHD_dataset/raw_data/;
per-sample merged profile files named like examples/FSHD_dataset/minkiPy_results_FSHD_exploratory_analysis/minkiPy_merged_resolution_20.0_<sample>.h5;
exploratory figures named examples/FSHD_dataset/fig_exploratory_*.pdf;
exploratory tables named examples/FSHD_dataset/table_exploratory_*.csv.

Expected demo runtime

On a normal desktop computer, the exploratory notebook is expected to take about 2 hours end-to-end in the background, including data download, extraction, Minkowski profile computation, and downstream exploratory analysis. The actual runtime depends on internet bandwidth, disk speed, CPU core count qnd MPI configuration.

Quick start (Python)

import minkiPy

h5_path = minkiPy.compute_Minkowski_profiles(
    transcripts_df,
    name="sample_A",
    output_path="results",
    resolution=20.0,
    nbr=25,
    n_cov_samples=None,  # default MC realisations; set 0 for faster exploratory runs
    # mpi_procs:
    # None -> auto-detect
    # 1    -> single process
    # >1   -> spawn MPI processes
)

Typical output file:

results/minkiPy_merged_resolution_<resolution>_<name>.h5

Example downstream loading:

filepaths = [
    "results/minkiPy_merged_resolution_20.0_sample_A.h5",
    "results/minkiPy_merged_resolution_20.0_sample_B.h5",
]

ordered_conditions = ["sample_A", "sample_B"]

data = minkiPy.process_data(
    filepaths,
    ordered_conditions=ordered_conditions,
    verbose=True,
)

Downstream analysis (beyond `process_data`)

After process_data, typical downstream steps include:

condition-level averaging with add_averaged_condition_datasets,
sample or gene distances with compute_sample_distances and compute_gene_distances,
graph and embedding visualisations (plot_dataset_graphs_from_data, plot_gene_graphs_from_data, plot_pca_grid_by_condition),
differential ranking and trend plots (plot_top_changing_genes, plot_w2_abslog2fc_with_trend),
profile-level diagnostics (plot_minkowski_profile, plot_w2_diag_vs_euclid_distributions, plot_w2_diag_vs_full_plus_euclid_distributions).

To get started quickly with your own data, begin with minkiPy_exploratory_workflow.ipynb.

Command-line usage

Run under MPI:

mpirun -n 8 python -m minkiPy \
  --input transcripts.csv \
  --name sample_A \
  --output-path results \
  --resolution 20 \
  --nbr 25

Custom column names:

mpirun -n 8 python -m minkiPy \
  --input transcripts.tsv \
  --sep '\t' \
  --gene-col gene_symbol \
  --x-col x \
  --y-col y \
  --name sample_A \
  --output-path results

Supported formats: .csv, .txt, .tsv, .parquet.

MPI usage patterns

1) Standard MPI launch

Launch your script with mpirun/mpiexec. compute_Minkowski_profiles(...) uses the active MPI communicator.

2) Auto-MPI from Python or notebook

h5_path = minkiPy.compute_Minkowski_profiles(
    transcripts_df,
    name="sample_A",
    output_path="results",
    resolution=20.0,
    nbr=25,
    mpi_procs=60,
    use_hwthreads=True,
)

Useful parameters:

mpi_procs (int | None, default None)
use_hwthreads (bool, default False)
oversubscribe (bool, default False)
extra_mpirun_args (list[str] | None)

Repository layout

minkiPy/
├── minkiPy/                              # Core package
│   ├── minkowski_core.py                 # Per-gene Minkowski profile computation
│   ├── mpi_driver.py                     # MPI distribution + auto-MPI wrapper
│   ├── cli.py                            # Command-line logic
│   ├── io.py                             # NPZ/HDF5 output writing and merge
│   └── downstream/                       # Post-processing, distances, visualisation
├── minkiPy_env.yaml                      # Conda environment definition
├── minkiPy_exploratory_workflow.ipynb    # Introductory exploratory workflow
├── minkiPy_FSHD_complete_workflow.ipynb  # Full FSHD workflow
├── minkiPy_CRC_complete_workflow.ipynb   # Full CRC workflow
└── examples/                             # Data staging for notebooks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

minkiPy

Contents

Input format

Method summary

System requirements

Software dependencies

Operating systems tested

Hardware requirements

Installation

Option A (recommended): pip

Option B: Conda environment from YAML

Option C: Local development install

Troubleshooting

Demo

Expected demo output

Expected demo runtime

Quick start (Python)

Downstream analysis (beyond `process_data`)

Command-line usage

MPI usage patterns

1) Standard MPI launch

2) Auto-MPI from Python or notebook

Repository layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
examples		examples
minkiPy		minkiPy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
minkiPy_CRC_complete_workflow.ipynb		minkiPy_CRC_complete_workflow.ipynb
minkiPy_FSHD_complete_workflow.ipynb		minkiPy_FSHD_complete_workflow.ipynb
minkiPy_env.yaml		minkiPy_env.yaml
minkiPy_exploratory_workflow.ipynb		minkiPy_exploratory_workflow.ipynb
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

minkiPy

Contents

Input format

Method summary

System requirements

Software dependencies

Operating systems tested

Hardware requirements

Installation

Option A (recommended): pip

Option B: Conda environment from YAML

Option C: Local development install

Troubleshooting

Demo

Expected demo output

Expected demo runtime

Quick start (Python)

Downstream analysis (beyond process_data)

Command-line usage

MPI usage patterns

1) Standard MPI launch

2) Auto-MPI from Python or notebook

Repository layout

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Downstream analysis (beyond `process_data`)

Packages