Skip to content

OpenImagingLab/InstantRetouch

Repository files navigation

InstantRetouch

Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space

Paper Conference License

Jiarui Wu, Yujin Wang, Ruikang Li, Fan Zhang, Mingde Yao, Tianfan Xue
Shanghai AI Laboratory, CUHK MMLab, CPII under InnoHK


InstantRetouch teaser

InstantRetouch targets instruction-guided photo retouching with two goals: high instruction fidelity and strong content preservation.
Our framework distills a multi-step diffusion editor into a one-step bilateral-space model for efficient high-resolution retouching.


Updates

  • 2026-03: Initial public release with training and inference pipeline.

Highlights

  • One-step retouching pipeline distilled from a multi-step diffusion teacher.
  • Bilateral-space full-resolution rendering for stronger structure and texture preservation.
  • Clean training path with 4 scripts: teacher, stage-1, stage-2, inference.
  • Public CLI with explicit paths and safety checks.

Framework

InstantRetouch framework

Training follows the paper's progressive design:

  1. Train a multi-step diffusion teacher (tools/ft_ip2p.py).
  2. Distill a one-step low-resolution diffusion branch (train_joint_distill_vsd.py, stage-1).
  3. Add bilateral branch and run joint distillation (train_joint_distill_vsd.py, stage-2).
  4. Run validation / inference with distilled checkpoints (train_joint_distill_vsd.py --only_val).

Visual Comparison

InstantRetouch visual comparison

Installation

Environment

conda create -n instantretouch python=3.10 -y
conda activate instantretouch
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Accelerate setup

accelerate config

Data Preparation

JSON format

Teacher and distillation use the same JSON schema:

[
  {
    "input": "input/0001.jpg",
    "output": "target/0001.jpg",
    "request": "Increase exposure slightly and warm up white balance."
  }
]

Folder convention

<DATASET_ROOT>/
├─ train/
├─ val/
├─ images_all/
├─ <TRAIN_JSON_FILE>.json
├─ <VAL_JSON_FILE>.json
└─ <INFER_JSON_FILE>.json

For train_joint_distill_vsd.py, image paths are resolved via:

  • --dataset_dir/<train_image_dir>/<input_or_output_filename>
  • --dataset_dir/<val_image_dir>/<input_or_output_filename>

Training and Inference

All scripts are in runs/ and each script is a single command.

1) Multi-step teacher training

bash runs/train_teacher_multistep.sh

Fill placeholders in script:

  • <PATH_TO_IP2P_BASE_MODEL>
  • <PATH_TO_DATASET_ROOT>
  • <TRAIN_JSON_FILE>
  • <OUTPUT_DIR_TEACHER>

2) Stage-1: low-resolution one-step diffusion distillation

bash runs/train_stage1_lowres_diffusion.sh

This stage optimizes the one-step diffusion branch before enabling bilateral-only training.

3) Stage-2: bilateral joint distillation

bash runs/train_stage2_joint_bilateral.sh

This stage resumes from stage-1 checkpoint and trains bilateral branch with joint objectives.

4) Inference / validation

bash runs/inference.sh

This runs validation/inference path using --only_val --val_fullres configuration.


Public CLI Additions

train_joint_distill_vsd.py

Argument Why it is exposed When to set
--scheduler_config_path Loads DDPM scheduler config for one-step denoising and latent decode behavior. Always. Default points to configs/ft_ip2p_scheduler.json.
--clip_model_name_or_path Backbone for l_clip_txt and l_clip_cont losses. Set when enabling CLIP-based objectives.
--attr_mapping_path Attribute-template mapping used by l_clip_cont contrastive loss. Required when --l_clip_cont > 0.
--iclip_model_path Local checkpoint path for InstructCLIP guidance. Required when --l_iclip > 0.

tools/ft_ip2p.py

Argument Why it is exposed When to set
--scheduler_config_path Uses an explicit scheduler JSON instead of hidden hardcoded scheduler settings. Always recommended for reproducibility.
--clip_model_name_or_path CLIP model for RGB-side attribute contrastive regularization. Set when --l_rgb > 0.
--train_dataset_dir / --json_dir / --image_dir Explicit dataset roots and splits for teacher training. Always for public data loading.

configs/attr_mapping_template.json is a lightweight placeholder mapping.
Replace it with your own mapping file if l_clip_cont is enabled.


Checkpoints

  • Teacher output (runs/train_teacher_multistep.sh): saved under <OUTPUT_DIR_TEACHER>.
  • Stage-1 distillation output: saved under <OUTPUT_DIR_STAGE1>/ckpts/.
  • Stage-2 distillation output: saved under <OUTPUT_DIR_STAGE2>/ckpts/.
  • Inference reads --resume_from_checkpoint and writes images to <OUTPUT_DIR_INFERENCE>/val_images/.

Recommended usage:

  1. Point stage-1 --pipeline_path to teacher pipeline.
  2. Point stage-2 --resume_from_checkpoint to stage-1 checkpoint.
  3. Point inference --resume_from_checkpoint to stage-2 checkpoint.

Troubleshooting

  • FileNotFoundError on JSON/image paths: check --dataset_dir, --train_json_dir, --val_json_dir, --train_image_dir, and --val_image_dir.
  • --iclip_model_path is required: set this path only if --l_iclip > 0.
  • OOM during stage-2: reduce --batch_size and/or increase --gradient_accumulation_steps.
  • Empty CLIP-attribute supervision: ensure your mapping JSON matches your file naming convention.

Repository Structure

.
├─ configs/
│  ├─ ft_ip2p_scheduler.json
│  └─ attr_mapping_template.json
├─ dataset/
│  └─ dataset_5kreq.py
├─ latex/
│  └─ paper_fig_png/
├─ models/
│  ├─ adapter_diffuser.py
│  ├─ iclip.py
│  └─ loss.py
├─ runs/
│  ├─ train_teacher_multistep.sh
│  ├─ train_stage1_lowres_diffusion.sh
│  ├─ train_stage2_joint_bilateral.sh
│  └─ inference.sh
├─ tools/
│  └─ ft_ip2p.py
├─ utils/
│  ├─ hist_loss.py
│  ├─ prompt_attrs.py
│  ├─ train_retrieval.py
│  └─ utils.py
├─ torch_layers.py
└─ train_joint_distill_vsd.py

Acknowledgements

This project builds on open-source diffusion and vision libraries, including Diffusers, Transformers, Accelerate, and PyTorch.


Citation

@inproceedings{wu2026instantretouch,
  title={InstantRetouch: Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space},
  author={Wu, Jiarui and Wang, Yujin and Li, Ruikang and Zhang, Fan and Yao, Mingde and Xue, Tianfan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

License

This project is released under Apache-2.0. See LICENSE.

About

[CVPR 2026] InstantRetouch: Efficient and High-Fidelity Instruction-Guided Image Retouching with Bilateral Space

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors