@@ -17,8 +17,8 @@ aliases:
1717Docker Model Runner (DMR) makes it easy to manage, run, and
1818deploy AI models using Docker. Designed for developers,
1919Docker Model Runner streamlines the process of pulling, running, and serving
20- large language models (LLMs) and other AI models directly from Docker Hub or any
21- OCI-compliant registry.
20+ large language models (LLMs) and other AI models directly from Docker Hub,
21+ any OCI-compliant registry, or [ Hugging Face ] ( https://huggingface.co/ ) .
2222
2323With seamless integration into Docker Desktop and Docker
2424Engine, you can serve models via OpenAI and Ollama-compatible APIs, package GGUF files as
@@ -32,7 +32,8 @@ with AI models locally.
3232
3333## Key features
3434
35- - [ Pull and push models to and from Docker Hub] ( https://hub.docker.com/u/ai )
35+ - [ Pull and push models to and from Docker Hub or any OCI-compliant registry] ( https://hub.docker.com/u/ai )
36+ - [ Pull models from Hugging Face] ( https://huggingface.co/ )
3637- Serve models on [ OpenAI and Ollama-compatible APIs] ( api-reference.md ) for easy integration with existing apps
3738- Support for [ llama.cpp, vLLM, and Diffusers inference engines] ( inference-engines.md ) (vLLM and Diffusers on Linux with NVIDIA GPUs)
3839- [ Generate images from text prompts] ( inference-engines.md#diffusers ) using Stable Diffusion models with the Diffusers backend
@@ -81,11 +82,12 @@ Docker Engine only:
8182
8283## How Docker Model Runner works
8384
84- Models are pulled from Docker Hub the first time you use them and are stored
85- locally. They load into memory only at runtime when a request is made, and
86- unload when not in use to optimize resources. Because models can be large, the
87- initial pull may take some time. After that, they're cached locally for faster
88- access. You can interact with the model using
85+ Models are pulled from Docker Hub, an OCI-compliant registry, or
86+ [ Hugging Face] ( https://huggingface.co/ ) the first time you use them and are
87+ stored locally. They load into memory only at runtime when a request is made,
88+ and unload when not in use to optimize resources. Because models can be large,
89+ the initial pull may take some time. After that, they're cached locally for
90+ faster access. You can interact with the model using
8991[ OpenAI and Ollama-compatible APIs] ( api-reference.md ) .
9092
9193### Inference engines
0 commit comments