HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

Yichen Liu^1,*, Donghao Zhou^2,*, Jie Wang³, Xin Gao³, Guisheng Liu³, Jiatong Li^3,†, Quanwei Zhang⁴,
Qiang Lyu¹, Lanqing Guo⁵, Shilei Wen^3,§, Weiqiang Wang^1,§, Pheng-Ann Heng^2,§

¹University of Chinese Academy of Sciences, ²The Chinese University of Hong Kong, ³ByteDance,
⁴Zhejiang University, ⁵UT Austin

*Equal contribution, †Project Lead, §Corresponding Author

🔥 Updates

2026.06: Inference code, training code, and model weights are released!
2026.02: Our paper is accepted by CVPR 2026!

✅ Open-Source Plan

HP-Image-40K Dataset
HiFi-Inpaint Inference Code
HiFi-Inpaint Training Code
HiFi-Inpaint Model Weights

🌍 Abstract

Human-product images, which showcase the integration of humans and products, play a vital role in advertising, e-commerce, and digital marketing. The essential challenge of generating such images lies in ensuring the high-fidelity preservation of product details. Among existing paradigms, reference-based inpainting offers a targeted solution by leveraging product reference images to guide the inpainting process. However, limitations remain in three key aspects: the lack of diverse large-scale training data, the struggle of current models to focus on product detail preservation, and the inability of coarse supervision for achieving precise guidance. To address these issues, we propose HiFi-Inpaint, a novel high-fidelity reference-based inpainting framework tailored for generating human-product images. HiFi-Inpaint introduces Shared Enhancement Attention (SEA) to refine fine-grained product features and Detail-Aware Loss (DAL) to enforce precise pixel-level supervision using high-frequency maps. Additionally, we construct a new dataset, HP-Image-40K, with samples curated from self-synthesis data and processed with automatic filtering. Experimental results show that HiFi-Inpaint achieves state-of-the-art performance, delivering detail-preserving human-product images.

We propose HiFi-Inpaint, a DiT-based framework that can seamlessly integrate product reference images into masked human images, generating high-quality human-product images with high-fidelity detail preservation.

🛠️ Environment Setup

We recommend using a clean Conda environment with Python 3.11:

git clone https://github.com/Correr-Zhou/HiFi-Inpaint.git
cd HiFi-Inpaint

conda create -n hifi-inpaint python=3.11 -y
conda activate hifi-inpaint

pip install -r requirements.txt

⚡ Inference

Download the base model FLUX.1-dev and our LoRA weights.
Update the paths in run_inference.sh:
- FLUX_PATH: path to FLUX.1-dev
- LORA_PATH: path to LoRA checkpoint
Run inference:

bash run_inference.sh

Results will be saved to ./output/.

📦 Training

Our training data format is compatible with HP-Image-40K. Each JSON entry should contain:

{
    "ref_image_path": "path/to/reference_image.png",
    "gt_image_path": "path/to/ground_truth_image.png",
    "condition_image_path": "path/to/masked_condition_image.png",
    "mask_path": "path/to/binary_mask.png",
    "caption": "text description of the image"
}

Update the config file train/config/hifi_inpaint.yaml:
- flux_path: path to FLUX.1-dev
- data_path: path to your training data JSON file(s)
Run training:

# Single GPU
bash train/scripts/train.sh

# Multi-GPU with torchrun
torchrun --nproc_per_node=8 -m src.train.train

Training logs are tracked via Weights & Biases. Set your API key before training:

export WANDB_API_KEY="your_wandb_api_key"

See train/config/ for more training configurations.

🤝 Acknowledgements

Our codebase is built upon OminiControl. We thank the authors for their excellent work.

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

🔗 Citation

If you find HiFi-Inpaint useful for your research and applications, please cite:

@article{liu2026hifiinpaint,
  title={HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images},
  author={Liu, Yichen and Zhou, Donghao and Wang, Jie and Gao, Xin and Liu, Guisheng and Li, Jiatong and Zhang, Quanwei and Lyu, Qiang and Guo, Lanqing and Wen, Shilei and Wang, Weiqiang and Heng, Pheng-Ann},
  journal={arXiv preprint arXiv:2603.02210},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
assets		assets
src		src
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
run_inference.sh		run_inference.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

🔥 Updates

✅ Open-Source Plan

🌍 Abstract

🛠️ Environment Setup

⚡ Inference

📦 Training

🤝 Acknowledgements

📄 License

🔗 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

🔥 Updates

✅ Open-Source Plan

🌍 Abstract

🛠️ Environment Setup

⚡ Inference

📦 Training

🤝 Acknowledgements

📄 License

🔗 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages