InstantRetouch

Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space

Jiarui Wu, Yujin Wang, Ruikang Li, Fan Zhang, Mingde Yao, Tianfan Xue
Shanghai AI Laboratory, CUHK MMLab, CPII under InnoHK

$InstantRetouch teaser$

InstantRetouch targets instruction-guided photo retouching with two goals: high instruction fidelity and strong content preservation.
Our framework distills a multi-step diffusion editor into a one-step bilateral-space model for efficient high-resolution retouching.

Updates

2026-03: Initial public release with training and inference pipeline.

Highlights

One-step retouching pipeline distilled from a multi-step diffusion teacher.
Bilateral-space full-resolution rendering for stronger structure and texture preservation.
Clean training path with 4 scripts: teacher, stage-1, stage-2, inference.
Public CLI with explicit paths and safety checks.

Framework

$InstantRetouch framework$

Training follows the paper's progressive design:

Train a multi-step diffusion teacher (tools/ft_ip2p.py).
Distill a one-step low-resolution diffusion branch (train_joint_distill_vsd.py, stage-1).
Add bilateral branch and run joint distillation (train_joint_distill_vsd.py, stage-2).
Run validation / inference with distilled checkpoints (train_joint_distill_vsd.py --only_val).

Visual Comparison

$InstantRetouch visual comparison$

Installation

Environment

conda create -n instantretouch python=3.10 -y
conda activate instantretouch
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Accelerate setup

accelerate config

Data Preparation

JSON format

Teacher and distillation use the same JSON schema:

[
  {
    "input": "input/0001.jpg",
    "output": "target/0001.jpg",
    "request": "Increase exposure slightly and warm up white balance."
  }
]

Folder convention

<DATASET_ROOT>/
├─ train/
├─ val/
├─ images_all/
├─ <TRAIN_JSON_FILE>.json
├─ <VAL_JSON_FILE>.json
└─ <INFER_JSON_FILE>.json

For train_joint_distill_vsd.py, image paths are resolved via:

--dataset_dir/<train_image_dir>/<input_or_output_filename>
--dataset_dir/<val_image_dir>/<input_or_output_filename>

Training and Inference

All scripts are in runs/ and each script is a single command.

1) Multi-step teacher training

bash runs/train_teacher_multistep.sh

Fill placeholders in script:

<PATH_TO_IP2P_BASE_MODEL>
<PATH_TO_DATASET_ROOT>
<TRAIN_JSON_FILE>
<OUTPUT_DIR_TEACHER>

2) Stage-1: low-resolution one-step diffusion distillation

bash runs/train_stage1_lowres_diffusion.sh

This stage optimizes the one-step diffusion branch before enabling bilateral-only training.

3) Stage-2: bilateral joint distillation

bash runs/train_stage2_joint_bilateral.sh

This stage resumes from stage-1 checkpoint and trains bilateral branch with joint objectives.

4) Inference / validation

bash runs/inference.sh

This runs validation/inference path using --only_val --val_fullres configuration.

Public CLI Additions

`train_joint_distill_vsd.py`

Argument	Why it is exposed	When to set
`--scheduler_config_path`	Loads DDPM scheduler config for one-step denoising and latent decode behavior.	Always. Default points to `configs/ft_ip2p_scheduler.json`.
`--clip_model_name_or_path`	Backbone for `l_clip_txt` and `l_clip_cont` losses.	Set when enabling CLIP-based objectives.
`--attr_mapping_path`	Attribute-template mapping used by `l_clip_cont` contrastive loss.	Required when `--l_clip_cont > 0`.
`--iclip_model_path`	Local checkpoint path for InstructCLIP guidance.	Required when `--l_iclip > 0`.

`tools/ft_ip2p.py`

Argument	Why it is exposed	When to set
`--scheduler_config_path`	Uses an explicit scheduler JSON instead of hidden hardcoded scheduler settings.	Always recommended for reproducibility.
`--clip_model_name_or_path`	CLIP model for RGB-side attribute contrastive regularization.	Set when `--l_rgb > 0`.
`--train_dataset_dir` / `--json_dir` / `--image_dir`	Explicit dataset roots and splits for teacher training.	Always for public data loading.

configs/attr_mapping_template.json is a lightweight placeholder mapping.
Replace it with your own mapping file if l_clip_cont is enabled.

Checkpoints

Teacher output (runs/train_teacher_multistep.sh): saved under <OUTPUT_DIR_TEACHER>.
Stage-1 distillation output: saved under <OUTPUT_DIR_STAGE1>/ckpts/.
Stage-2 distillation output: saved under <OUTPUT_DIR_STAGE2>/ckpts/.
Inference reads --resume_from_checkpoint and writes images to <OUTPUT_DIR_INFERENCE>/val_images/.

Recommended usage:

Point stage-1 --pipeline_path to teacher pipeline.
Point stage-2 --resume_from_checkpoint to stage-1 checkpoint.
Point inference --resume_from_checkpoint to stage-2 checkpoint.

Troubleshooting

FileNotFoundError on JSON/image paths: check --dataset_dir, --train_json_dir, --val_json_dir, --train_image_dir, and --val_image_dir.
--iclip_model_path is required: set this path only if --l_iclip > 0.
OOM during stage-2: reduce --batch_size and/or increase --gradient_accumulation_steps.
Empty CLIP-attribute supervision: ensure your mapping JSON matches your file naming convention.

Repository Structure

.
├─ configs/
│  ├─ ft_ip2p_scheduler.json
│  └─ attr_mapping_template.json
├─ dataset/
│  └─ dataset_5kreq.py
├─ latex/
│  └─ paper_fig_png/
├─ models/
│  ├─ adapter_diffuser.py
│  ├─ iclip.py
│  └─ loss.py
├─ runs/
│  ├─ train_teacher_multistep.sh
│  ├─ train_stage1_lowres_diffusion.sh
│  ├─ train_stage2_joint_bilateral.sh
│  └─ inference.sh
├─ tools/
│  └─ ft_ip2p.py
├─ utils/
│  ├─ hist_loss.py
│  ├─ prompt_attrs.py
│  ├─ train_retrieval.py
│  └─ utils.py
├─ torch_layers.py
└─ train_joint_distill_vsd.py

Acknowledgements

This project builds on open-source diffusion and vision libraries, including Diffusers, Transformers, Accelerate, and PyTorch.

Citation

@inproceedings{wu2026instantretouch,
  title={InstantRetouch: Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space},
  author={Wu, Jiarui and Wang, Yujin and Li, Ruikang and Zhang, Fan and Yao, Mingde and Xue, Tianfan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

License

This project is released under Apache-2.0. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InstantRetouch

Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space

Updates

Highlights

Framework

Visual Comparison

Installation

Environment

Accelerate setup

Data Preparation

JSON format

Folder convention

Training and Inference

1) Multi-step teacher training

2) Stage-1: low-resolution one-step diffusion distillation

3) Stage-2: bilateral joint distillation

4) Inference / validation

Public CLI Additions

`train_joint_distill_vsd.py`

`tools/ft_ip2p.py`

Checkpoints

Troubleshooting

Repository Structure

Acknowledgements

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
dataset		dataset
latex/paper_fig_png		latex/paper_fig_png
models		models
runs		runs
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
torch_layers.py		torch_layers.py
train_joint_distill_vsd.py		train_joint_distill_vsd.py

Folders and files

Latest commit

History

Repository files navigation

InstantRetouch

Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space

Updates

Highlights

Framework

Visual Comparison

Installation

Environment

Accelerate setup

Data Preparation

JSON format

Folder convention

Training and Inference

1) Multi-step teacher training

2) Stage-1: low-resolution one-step diffusion distillation

3) Stage-2: bilateral joint distillation

4) Inference / validation

Public CLI Additions

train_joint_distill_vsd.py

tools/ft_ip2p.py

Checkpoints

Troubleshooting

Repository Structure

Acknowledgements

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`train_joint_distill_vsd.py`

`tools/ft_ip2p.py`

Packages