diff --git a/content/patterns/maas-quickstart/_index.adoc b/content/patterns/maas-quickstart/_index.adoc new file mode 100644 index 000000000..b7c93d630 --- /dev/null +++ b/content/patterns/maas-quickstart/_index.adoc @@ -0,0 +1,39 @@ +--- +title: MaaS Code Assistant AI Quickstart +date: 2026-06-03 +tier: sandbox +summary: This pattern deploys a multi-tenant AI code assistant with NVIDIA Nemotron models, tiered rate limiting, and IDE integration on OpenShift. +rh_products: + - Red Hat OpenShift Container Platform + - Red Hat OpenShift AI + - Red Hat OpenShift DevSpaces + - Red Hat Connectivity Link +industries: + - General +focus_areas: + - AI + - Code + - AI Quickstart +aliases: /maas-quickstart/ +links: + github: https://github.com/validatedpatterns-sandbox/ai-quickstart-maas-code-assistant + install: getting-started + bugs: https://github.com/validatedpatterns-sandbox/ai-quickstart-maas-code-assistant/issues + feedback: https://docs.google.com/forms/d/e/1FAIpQLScI76b6tD1WyPu2-d_9CCVDr3Fu5jYERthqLKJDUGwqBg7Vcg/viewform +--- +:toc: +:imagesdir: /images +:_content-type: ASSEMBLY +include::modules/comm-attributes.adoc[] + +include::modules/maas-quickstart-about.adoc[leveloffset=+1] + +include::modules/maas-quickstart-architecture.adoc[leveloffset=+1] + +[id="next-steps-maas-quickstart"] +== Next steps + +* link:getting-started[Install this pattern] +* link:cluster-sizing[Cluster sizing] +* link:customizing-this-pattern[Customizing this pattern] +* link:troubleshooting[Troubleshooting] diff --git a/content/patterns/maas-quickstart/cluster-sizing.adoc b/content/patterns/maas-quickstart/cluster-sizing.adoc new file mode 100644 index 000000000..0bac33f86 --- /dev/null +++ b/content/patterns/maas-quickstart/cluster-sizing.adoc @@ -0,0 +1,29 @@ +--- +title: Cluster sizing +weight: 30 +aliases: /maas-quickstart/cluster-sizing/ +--- + +:toc: +:imagesdir: /images +:_content-type: ASSEMBLY +include::modules/comm-attributes.adoc[] +include::modules/ai-quickstart-maas-code-assistant/metadata-ai-quickstart-maas-code-assistant.adoc[] + +include::modules/cluster-sizing-template.adoc[] + +[id="maas-quickstart-gpu-node-requirements"] +== GPU node requirements + +In addition to the worker nodes listed above, this pattern requires at least 2 GPU-equipped nodes for model inference. On AWS, the pattern automatically provisions `g6e.2xlarge` instances with NVIDIA L40S GPUs. On other providers and bare metal, GPU nodes must already be part of the cluster before deploying the pattern. + +.GPU node minimum requirements +[cols="<,^,<,<"] +|=== +| Cloud provider | Node type | Number of nodes | Instance type + +| Amazon Web Services +| GPU Worker +| 2 +| g6e.2xlarge +|=== diff --git a/content/patterns/maas-quickstart/customizing-this-pattern.adoc b/content/patterns/maas-quickstart/customizing-this-pattern.adoc new file mode 100644 index 000000000..35680f53d --- /dev/null +++ b/content/patterns/maas-quickstart/customizing-this-pattern.adoc @@ -0,0 +1,143 @@ +--- +title: Customizing this pattern +weight: 20 +aliases: /maas-quickstart/customizing/ +--- + +:toc: +:imagesdir: /images +:_content-type: ASSEMBLY +include::modules/comm-attributes.adoc[] + +[id="customizing-maas-quickstart"] +== Customizing the MaaS Code Assistant AI Quickstart pattern + +This pattern deploys an AI code assistant with tiered user access, rate limiting, and NVIDIA Nemotron model serving. You can customize the models, rate limit policies, user tiers, and IDE configuration. + +[id="changing-models-maas"] +=== Changing models + +The pattern serves two models by default: + +* `nemotron-3-nano-30b-a3b-fp8` -- Available to premium and enterprise tier users. +* `gpt-oss-20b` -- Available to all user tiers. + +To change or add models, edit the `models` list in `overrides/maas-quickstart.yaml`. The pattern pulls models from OCI registries and does not require a HuggingFace API token. + +The model definitions specify the model URI, resource requirements, GPU tolerations, and vLLM arguments. For example: + +[source,yaml] +---- +models: + - name: gpt-oss-20b + displayName: OpenAI gpt-oss-20b + uri: oci://registry.redhat.io/rhelai1/modelcar-gpt-oss-20b:1.5 + resources: + limits: + cpu: "4" + memory: 24Gi + nvidia.com/gpu: "1" + requests: + cpu: "2" + memory: 16Gi + nvidia.com/gpu: "1" + extraArgs: + - --enable-force-include-usage + tolerations: + - effect: NoSchedule + key: nvidia.com/gpu + operator: Exists +---- + +[NOTE] +==== +Each model requires a GPU with at least 48 GB of VRAM. Adding models beyond the default two requires additional GPU nodes. +==== + +[id="adjusting-rate-limits-maas"] +=== Adjusting rate limits and user tiers + +The pattern uses Kuadrant (Red Hat Connectivity Link) to enforce per-tier rate limits on inference requests. The default tiers and limits are: + +[cols="1,1,2",options="header"] +|=== +| Tier | Rate limit | Description + +| Free +| 5 requests per 2 minutes +| Basic access for evaluation + +| Premium +| 20 requests per 2 minutes +| Standard production usage + +| Enterprise +| 50 requests per 2 minutes +| High-throughput workloads +|=== + +To adjust rate limits, modify the `tiers` section in `overrides/maas-quickstart.yaml`. The following example increases the premium tier request limit to 40 and the token limit to 20000: + +[source,yaml] +---- +tiers: + premium: + users: + - premium-user + requestRates: + - limit: 40 + window: 2m + tokenRates: + - limit: 20000 + window: 1m +---- + +Push your changes to your forked repository so the GitOps framework applies the updated configuration. + +[id="managing-users-maas"] +=== Managing users + +htpasswd with OpenShift OAuth handles user authentication. The default users are: + +* `admin` -- Full administrative access (enterprise tier) +* `free-user` -- Free tier access +* `premium-user` -- Premium tier access +* `enterprise-user` -- Enterprise tier access + +{hashicorp-vault} and the {eso-op} store and manage user passwords in the `values-secret.yaml` file. To change a user password after initial deployment, update the secret value in your `values-secret.yaml` file and redeploy the pattern. + +To assign users to different tiers, modify the `tiers` section in `overrides/maas-quickstart.yaml`: + +[source,yaml] +---- +tiers: + free: + users: + - free-user + premium: + users: + - premium-user + - user1 + enterprise: + users: + - admin + - enterprise-user +---- + +[id="configuring-devspaces-maas"] +=== Configuring OpenShift DevSpaces + +The pattern integrates the Continue AI extension in OpenShift DevSpaces to provide code assistance directly in the IDE. DevSpaces is preconfigured to clone the AI Quickstart repository and connect to the vLLM inference endpoints. + +To customize the DevSpaces configuration, you can adjust: + +* Default IDE settings and extensions +* Resource limits for developer workspaces +* The inference endpoint URL used by the Continue extension + +[id="gpu-node-provisioning-maas"] +=== Provisioning GPU nodes + +This pattern requires at least 2 NVIDIA GPU nodes with 48 GB or more of VRAM each. On AWS, the pattern automatically provisions `g6e.2xlarge` GPU machine sets with NVIDIA L40S GPUs. + +If your cluster does not have GPU nodes, you must add them before you deploy the pattern. The pattern installs all required operators, including the NVIDIA GPU Operator, automatically during deployment. diff --git a/content/patterns/maas-quickstart/getting-started.adoc b/content/patterns/maas-quickstart/getting-started.adoc new file mode 100644 index 000000000..e20878355 --- /dev/null +++ b/content/patterns/maas-quickstart/getting-started.adoc @@ -0,0 +1,193 @@ +--- +title: Getting started +weight: 10 +aliases: /maas-quickstart/getting-started/ +--- + +:toc: +:imagesdir: /images +:_content-type: ASSEMBLY +include::modules/comm-attributes.adoc[] + +[id="deploying-maas-quickstart-pattern"] +== Deploying the MaaS Code Assistant AI Quickstart pattern + +.Prerequisites + +* An OpenShift cluster (version 4.20 or later). This pattern requires at least 2 NVIDIA GPU nodes with 48 GB or more of VRAM each. + ** *AWS*: The pattern automatically provisions 2 `g6e.2xlarge` GPU worker nodes (NVIDIA L40S) during installation. No GPU nodes need to be present before you deploy. + ** *Other providers and bare metal*: GPU nodes must already be part of the OpenShift cluster before you deploy this pattern. The pattern installs all required operators automatically. + ** To create an OpenShift cluster, go to the https://console.redhat.com/[Red Hat Hybrid Cloud console]. + ** Select *OpenShift \-> Red Hat OpenShift Container Platform \-> Create cluster*. +* The Helm binary. For instructions, see link:https://helm.sh/docs/intro/install/[Installing Helm]. +* The `oc` CLI tool. For instructions, see link:https://docs.openshift.com/container-platform/latest/cli_reference/openshift_cli/getting-started-cli.html[Getting started with the OpenShift CLI]. +* Additional installation tool dependencies. For details, see link:https://validatedpatterns.io/learn/quickstart/[Patterns quick start]. + +[id="preparing-for-deployment-maas"] +== Preparing for deployment +.Procedure + +. Fork the link:https://github.com/validatedpatterns-sandbox/ai-quickstart-maas-code-assistant[ai-quickstart-maas-code-assistant] repository on GitHub. You must fork the repository to customize this pattern. + +. Clone the forked copy of this repository. ++ +[source,terminal] +---- +$ git clone git@github.com:your-username/ai-quickstart-maas-code-assistant.git +---- + +. Go to the root directory of your Git repository: ++ +[source,terminal] +---- +$ cd ai-quickstart-maas-code-assistant +---- + +. Run the following command to set the upstream repository: ++ +[source,terminal] +---- +$ git remote add -f upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-maas-code-assistant.git +---- + +. Verify the setup of your remote repositories by running the following command: ++ +[source,terminal] +---- +$ git remote -v +---- ++ +.Example output ++ +[source,terminal] +---- +origin git@github.com:your-username/ai-quickstart-maas-code-assistant.git (fetch) +origin git@github.com:your-username/ai-quickstart-maas-code-assistant.git (push) +upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-maas-code-assistant.git (fetch) +upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-maas-code-assistant.git (push) +---- + +. Make a local copy of the secrets template outside of your repository to hold credentials for the pattern. ++ +[WARNING] +==== +Do not add, commit, or push this file to your repository. Doing so might expose personal credentials to GitHub. +==== ++ +Run the following command: ++ +[source,terminal] +---- +$ cp values-secret.yaml.template ~/values-secret-ai-quickstart-maas-code-assistant.yaml +---- + +. Populate this file with the user passwords needed for the pattern: ++ +[source,terminal] +---- +$ vim ~/values-secret-ai-quickstart-maas-code-assistant.yaml +---- + +.. Edit the `htpasswd` section to set passwords for each user tier: ++ +[source,yaml] +---- + - name: htpasswd + fields: + - name: admin + value: + - name: free-user + value: + - name: premium-user + value: + - name: enterprise-user + value: +---- + +. Optional: To customize the deployment, create and switch to a new branch by running the following command: ++ +[source,terminal] +---- +$ git checkout -b my-branch +---- ++ +Make your changes, then stage and commit them: ++ +[source,terminal] +---- +$ git add +$ git commit -m "Customize deployment" +---- ++ +Push the changes to your forked repository: ++ +[source,terminal] +---- +$ git push origin my-branch +---- + +[id="deploying-cluster-using-patternsh-file-maas"] +== Deploying the pattern by using the pattern.sh file + +To deploy the pattern by using the `pattern.sh` file, complete the following steps: + +. Log in to your cluster by following this procedure: + +.. Obtain an API token by visiting link:https://oauth-openshift.apps../oauth/token/request[https://oauth-openshift.apps../oauth/token/request]. + +.. Log in to the cluster by running the following command: ++ +[source,terminal] +---- +$ oc login --token= --server=https://api..:6443 +---- ++ +Or log in by running the following command: ++ +[source,terminal] +---- +$ export KUBECONFIG=~/ +---- + +. Deploy the pattern to your cluster. Run the following command: ++ +[source,terminal] +---- +$ ./pattern.sh make install +---- + +.Verification + +To verify a successful installation, check the health of the ArgoCD applications: + +. Run the following command: ++ +[source,terminal] +---- +$ ./pattern.sh make argo-healthcheck +---- ++ +It might take several minutes for all applications to synchronize and reach a healthy state. This includes downloading the NVIDIA Nemotron models and configuring the inference endpoints. + +. Verify that the Operators are installed by navigating to *Operators -> Installed Operators* in the {ocp} web console. Confirm the following Operators are present: ++ +* NVIDIA GPU Operator +* {rhoai} +* Red Hat OpenShift DevSpaces +* Red Hat Connectivity Link + +. After all applications are healthy, verify the inference endpoints are serving by running: ++ +[source,terminal] +---- +$ oc get inferenceservice -A +---- + +. Access the OpenShift DevSpaces dashboard to confirm the IDE environment is available. Navigate to *Networking -> Routes* in the DevSpaces namespace and open the route URL. + +[id="next-steps-getting-started-maas"] +== Next steps + +* link:customizing-this-pattern[Customizing this pattern] +* link:cluster-sizing[Cluster sizing] +* link:troubleshooting[Troubleshooting] diff --git a/content/patterns/maas-quickstart/troubleshooting.adoc b/content/patterns/maas-quickstart/troubleshooting.adoc new file mode 100644 index 000000000..2dac7dff1 --- /dev/null +++ b/content/patterns/maas-quickstart/troubleshooting.adoc @@ -0,0 +1,264 @@ +--- +title: Troubleshooting +weight: 40 +aliases: /maas-quickstart/troubleshooting/ +--- + +:toc: +:imagesdir: /images +:_content-type: ASSEMBLY +include::modules/comm-attributes.adoc[] + +[id="troubleshooting-maas-quickstart"] +== Troubleshooting the MaaS Code Assistant AI Quickstart pattern + +Use this page to diagnose and resolve common issues when deploying or operating this pattern. + +[id="troubleshooting-prereqs-maas"] +== Prerequisite and tooling issues + +[id="troubleshooting-podman-version"] +=== Podman version not supported + +The `pattern.sh` script requires Podman 4.3.0 or later. Earlier versions do not support the `--userns=keep-id` flag required for correct UID/GID mapping inside the container. + +.Symptom + +The script exits with an error referencing the Podman version or `keep-id`. + +.Resolution + +. Check your Podman version: ++ +[source,terminal] +---- +$ podman --version +---- + +. If the version is earlier than 4.3.0, upgrade Podman. For instructions, see the link:https://podman.io/docs/installation[Podman installation documentation]. + +[id="troubleshooting-kubeconfig"] +=== KUBECONFIG path is outside the HOME directory + +The `pattern.sh` script runs inside a container and mounts your `$HOME` directory. If your `KUBECONFIG` file is located outside `$HOME`, the container cannot access it. + +.Symptom + +The script fails to connect to the cluster or reports that the kubeconfig file cannot be found. + +.Resolution + +Move your kubeconfig file to a path inside your home directory and export the updated path: + +[source,terminal] +---- +$ cp ~/kubeconfig +$ export KUBECONFIG=~/kubeconfig +---- + +[id="troubleshooting-deployment-maas"] +== Deployment issues + +[id="troubleshooting-argocd-sync"] +=== ArgoCD applications are not syncing or are unhealthy + +After running `./pattern.sh make install`, ArgoCD applications can take 15–30 minutes to reach a healthy state. Model downloads and GPU operator initialization take additional time. + +.Symptom + +Running `./pattern.sh make argo-healthcheck` reports applications in `Progressing` or `Degraded` state. + +.Resolution + +. Check which applications are not healthy: ++ +[source,terminal] +---- +$ oc get applications -n openshift-gitops +---- + +. Inspect the failing application for error details: ++ +[source,terminal] +---- +$ oc describe application -n openshift-gitops +---- + +. Check the logs of the ArgoCD application controller: ++ +[source,terminal] +---- +$ oc logs -n openshift-gitops deployment/openshift-gitops-application-controller +---- + +. If applications are stuck in `Progressing`, wait an additional 10 minutes and re-run the health check. Model downloads from OCI registries can take significant time depending on network conditions. + +[id="troubleshooting-schema-validation"] +=== Values file schema validation fails + +The pattern validates `values-*.yaml` files against a schema before deployment. + +.Symptom + +Running `./pattern.sh make install` fails with a schema validation error. + +.Resolution + +. Run the validation step independently to see the full error output: ++ +[source,terminal] +---- +$ ./pattern.sh make validate-schema +---- + +. Review the error message to identify the malformed field and correct the value in your `values-secret.yaml` or `overrides/maas-quickstart.yaml` file. + +[id="troubleshooting-gpu-maas"] +== GPU and inference issues + +[id="troubleshooting-gpu-nodes"] +=== GPU nodes are not ready + +The NVIDIA GPU Operator must successfully initialize on each GPU node before model serving can start. + +.Symptom + +Inference service pods remain in `Pending` state, or `oc get inferenceservice -A` shows services not ready. + +.Resolution + +. Check the status of GPU nodes: ++ +[source,terminal] +---- +$ oc get nodes -l nvidia.com/gpu.present=true +---- + +. Check the NVIDIA GPU Operator pods: ++ +[source,terminal] +---- +$ oc get pods -n nvidia-gpu-operator +---- + +. Check for driver initialization errors: ++ +[source,terminal] +---- +$ oc logs -n nvidia-gpu-operator -l app=nvidia-driver-daemonset +---- + +. If you are using a provider other than AWS, confirm that GPU nodes were present in the cluster before you deployed the pattern. The pattern does not provision GPU nodes on providers other than AWS. + +[id="troubleshooting-inference-endpoints"] +=== Inference endpoints are not serving + +.Symptom + +`oc get inferenceservice -A` shows inference services in a non-ready state, or the Continue AI extension in DevSpaces returns connection errors. + +.Resolution + +. Check the status of inference services: ++ +[source,terminal] +---- +$ oc get inferenceservice -A +---- + +. Check the vLLM model server pod logs for a specific model: ++ +[source,terminal] +---- +$ oc logs -n redhat-ods-applications -l serving.kserve.io/inferenceservice= +---- + +. Confirm that the GPU nodes have sufficient available VRAM. Each model requires a GPU with at least 48 GB of VRAM. If both models are scheduled on the same node, the node requires at least 96 GB of VRAM or you must use two separate GPU nodes. + +[id="troubleshooting-rate-limiting-maas"] +== Rate limiting and authentication issues + +[id="troubleshooting-rate-limits"] +=== Rate limiting is not enforced + +.Symptom + +Requests from all users succeed regardless of the configured rate limits, or requests are blocked for all users. + +.Resolution + +. Check the status of the Kuadrant operator and Limitador pod: ++ +[source,terminal] +---- +$ oc get pods -n kuadrant-system +---- + +. Check the Limitador logs for policy errors: ++ +[source,terminal] +---- +$ oc logs -n kuadrant-system deployment/limitador +---- + +. Confirm that rate limit policies are applied correctly: ++ +[source,terminal] +---- +$ oc get ratelimitpolicy -A +---- + +[id="troubleshooting-auth-maas"] +=== Users cannot authenticate + +.Symptom + +Users receive authentication errors when accessing the inference API or DevSpaces. + +.Resolution + +. Confirm that the htpasswd secret was correctly provisioned by the External Secrets Operator: ++ +[source,terminal] +---- +$ oc get externalsecret -A +$ oc get secret htpasswd-secret -n openshift-config +---- + +. If the secret is missing or incorrect, verify that your `values-secret.yaml` file contains the correct passwords for all four users (`admin`, `free-user`, `premium-user`, `enterprise-user`) and redeploy the pattern. + +[id="troubleshooting-devspaces-maas"] +== OpenShift DevSpaces issues + +[id="troubleshooting-devspaces-connection"] +=== Continue AI extension cannot connect to inference endpoints + +.Symptom + +Code suggestions are not returned in DevSpaces, or the Continue extension reports a connection error. + +.Resolution + +. Confirm that the inference services are healthy: ++ +[source,terminal] +---- +$ oc get inferenceservice -A +---- + +. Navigate to *Networking -> Routes* in the namespace where the inference services are running and confirm the routes are accessible. + +. In DevSpaces, open the Continue extension settings and verify that the endpoint URL matches the route URL for the vLLM service. + +[id="troubleshooting-get-help-maas"] +== Getting help + +If you cannot resolve an issue using this guide: + +* Check the link:https://github.com/validatedpatterns-sandbox/ai-quickstart-maas-code-assistant/issues[GitHub issues] for known problems and workarounds. +* Open a new issue with the output of the following command to help diagnose the problem: ++ +[source,terminal] +---- +$ oc get pods -A | grep -v Running | grep -v Completed +---- diff --git a/content/patterns/rag-quickstart/_index.adoc b/content/patterns/rag-quickstart/_index.adoc index 037bd5524..183cb814e 100644 --- a/content/patterns/rag-quickstart/_index.adoc +++ b/content/patterns/rag-quickstart/_index.adoc @@ -9,6 +9,10 @@ rh_products: - Red Hat OpenShift AI industries: - General +focus_areas: + - AI + - Data + - AI Quickstart aliases: /rag-quickstart/ links: github: https://github.com/validatedpatterns-sandbox/ai-quickstart-rag diff --git a/modules/ai-quickstart-maas-code-assistant/metadata-ai-quickstart-maas-code-assistant.adoc b/modules/ai-quickstart-maas-code-assistant/metadata-ai-quickstart-maas-code-assistant.adoc new file mode 100644 index 000000000..367966319 --- /dev/null +++ b/modules/ai-quickstart-maas-code-assistant/metadata-ai-quickstart-maas-code-assistant.adoc @@ -0,0 +1,22 @@ +// This file defines cluster sizing attributes for the MaaS Code Assistant AI Quickstart pattern. +// The pattern-metadata.yaml has an empty requirements field, so these values are defined manually +// based on tested configurations. +:metadata_version: 1.0 +:name: ai-quickstart-maas-code-assistant +:pattern_version: 1.0 +:description: Deploy a multi-tenant AI code assistant with NVIDIA Nemotron models, tiered rate limiting, and IDE integration. +:display_name: MaaS Code Assistant AI Quickstart +:repo_url: https://github.com/validatedpatterns-sandbox/ai-quickstart-maas-code-assistant +:docs_repo_url: https://github.com/validatedpatterns/docs +:issues_url: https://github.com/validatedpatterns-sandbox/ai-quickstart-maas-code-assistant/issues +:docs_url: https://validatedpatterns.io/patterns/maas-quickstart/ +:ci_url: https://validatedpatterns.io/ci/?pattern=maas-quickstart +:tier: sandbox +:owners: dminnear-rh +:requirements_hub_controlPlane_platform_aws_replicas: 3 +:requirements_hub_controlPlane_platform_aws_type: m5.xlarge +:requirements_hub_compute_platform_aws_replicas: 3 +:requirements_hub_compute_platform_aws_type: m5.2xlarge +:extra_features_hypershift_support: false +:extra_features_spoke_support: false +:external_requirements: diff --git a/modules/maas-quickstart-about.adoc b/modules/maas-quickstart-about.adoc new file mode 100644 index 000000000..d39da8e5c --- /dev/null +++ b/modules/maas-quickstart-about.adoc @@ -0,0 +1,74 @@ +:_content-type: CONCEPT +:imagesdir: ../../images +include::comm-attributes.adoc[] + +[id="about-maas-quickstart"] += About the MaaS Code Assistant AI Quickstart pattern + +Deploy a governed, multi-tenant AI code assistant on OpenShift with tiered access control, rate limiting, and integrated IDE support. + +Use case:: + +* Deploy an AI-powered code assistant that provides intelligent code suggestions through an integrated development environment. +* Implement Model-as-a-Service (MaaS) governance with tiered user access, rate limiting, and chargeback capabilities. +* Use a GitOps approach to provision AI inference infrastructure including GPU-accelerated model serving, identity management, and API rate limiting. + +Background:: + +This pattern builds on the link:https://github.com/rh-ai-quickstart/maas-code-assistant[MaaS Code Assistant AI Quickstart]. It provisions the OpenShift cluster with link:https://www.redhat.com/en/products/ai/openshift-ai[{rhoai}] configured for GPU-accelerated inference using vLLM and llm-d. It deploys the NVIDIA GPU Operator for model serving on GPU nodes and manages secrets through the {solution-name-upstream} framework using HashiCorp Vault and the External Secrets Operator. This pattern generalizes one or more successful deployments of this use case. Implementation details might vary depending on your specific environment and requirements. + +Organizations can use the MaaS Code Assistant to offer AI code assistance as an internal service with differentiated access tiers. It demonstrates a production-ready approach to: + +- Serving multiple NVIDIA Nemotron language models optimized for code completion and generation +- Enforcing per-user rate limits through Kuadrant (Red Hat Connectivity Link) to manage capacity and enable chargeback +- Authenticating users through htpasswd with OpenShift OAuth for tiered access (Free, Premium, Enterprise) +- Providing an integrated development experience through OpenShift DevSpaces with the Continue AI extension +- Monitoring usage and performance through Grafana dashboards and Prometheus metrics + +[id="about-maas-quickstart-solution"] +== About the solution + +This pattern deploys a complete MaaS code assistance platform on a single OpenShift cluster by using a GitOps approach. The {solution-name-upstream} framework handles infrastructure provisioning, including GPU operators, AI platform configuration, and secrets management. The MaaS Code Assistant AI Quickstart delivers the application layer: model serving, rate limiting, user authentication, and IDE integration. + +The solution uses vLLM with llm-d for high-performance inference of NVIDIA Nemotron models. Kuadrant enforces rate limit policies per user tier, while htpasswd with OpenShift OAuth manages authentication and tier assignment. OpenShift DevSpaces provides a browser-based IDE with the Continue AI extension preconfigured to connect to the inference endpoints. + +[id="about-maas-quickstart-technology"] +== About the technology + +This solution uses the following technologies: + +https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it[{rh-ocp}]:: +An enterprise-ready Kubernetes container platform built for an open hybrid cloud strategy. It provides a consistent application platform to manage hybrid cloud, public cloud, and edge deployments. + +https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it[{rh-gitops}]:: +A declarative application continuous delivery tool for Kubernetes based on the ArgoCD project. Application definitions, configurations, and environments are declarative and version controlled in Git. + +https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai[{rhoai}]:: +A flexible, scalable MLOps platform with tools to build, deploy, and manage AI-enabled applications. This pattern uses {rhoai} to manage GPU-accelerated model serving with vLLM. + +https://developers.redhat.com/products/openshift-dev-spaces/overview[Red{nbsp}Hat OpenShift DevSpaces]:: +A cloud-based developer workspace platform that provides preconfigured, containerized development environments. This pattern uses DevSpaces to deliver an integrated IDE with AI code assistance. + +https://docs.redhat.com/en/documentation/red_hat_connectivity_link[Red{nbsp}Hat Connectivity Link (Kuadrant)]:: +An API management and connectivity solution that provides rate limiting, authentication, and traffic policies. This pattern uses Kuadrant to enforce per-tier rate limits on inference requests. + +https://docs.vllm.ai/[vLLM]:: +A high-throughput, memory-efficient inference engine for large language models. vLLM serves the Nemotron models with optimized GPU utilization. + +https://github.com/llm-d/llm-d[llm-d]:: +A Kubernetes-native distributed inference framework for LLMs that works with vLLM to provide scalable model serving. + +https://developer.nvidia.com/nemotron[NVIDIA Nemotron]:: +A family of language models optimized for code generation and completion tasks. The pattern serves `nemotron-3-nano-30b-a3b-fp8` and `gpt-oss-20b`. + +https://grafana.com/[Grafana]:: +An open source analytics and monitoring platform. This pattern uses Grafana dashboards to visualize inference metrics and usage per tier. + +https://prometheus.io/[Prometheus]:: +An open source monitoring and alerting toolkit. This pattern uses Prometheus to collect inference and rate limiting metrics. + +https://cert-manager.io/[cert-manager]:: +A Kubernetes-native certificate management controller. This pattern uses cert-manager to provision and manage TLS certificates. + +https://github.com/continuedev/continue[Continue]:: +An open source AI code assistant extension for IDEs. This pattern integrates Continue in OpenShift DevSpaces to provide code suggestions powered by the served models. diff --git a/modules/maas-quickstart-architecture.adoc b/modules/maas-quickstart-architecture.adoc new file mode 100644 index 000000000..61ca9a4c9 --- /dev/null +++ b/modules/maas-quickstart-architecture.adoc @@ -0,0 +1,135 @@ +:_content-type: CONCEPT +:imagesdir: ../../images +include::comm-attributes.adoc[] + +[id="maas-quickstart-architecture"] += MaaS Code Assistant AI Quickstart architecture + +The following figure shows the MaaS Code Assistant architecture. + +.MaaS Code Assistant system architecture +image::maas-quickstart/code-assist-diagram.png[MaaS Code Assistant Architecture,link="/images/maas-quickstart/code-assist-diagram.png"] + +The architecture consists of three main layers: + +* *Inference Layer* -- Serves NVIDIA Nemotron models through vLLM and llm-d with GPU acceleration for code completion and generation. +* *Governance Layer* -- Manages user authentication through htpasswd with OpenShift OAuth and enforces per-tier rate limits through Kuadrant. +* *Developer Experience Layer* -- Provides an integrated IDE through OpenShift DevSpaces with the Continue AI extension connected to the inference endpoints. + +[id="maas-quickstart-inference-layer"] +== Inference layer + +The inference layer serves language models and processes code completion requests: + +vLLM Model Servers:: +Serve NVIDIA Nemotron models with GPU acceleration. Each model runs as a vLLM instance managed by {rhoai}, optimized for high-throughput inference with features like continuous batching and PagedAttention. + +llm-d:: +Provides Kubernetes-native distributed inference orchestration. llm-d manages model placement, scaling, and request routing across GPU nodes using the LeaderWorkerSet (LWS) operator. + +NVIDIA GPU Operator:: +Manages NVIDIA GPU drivers, device plugins, and monitoring on worker nodes. Ensures GPUs are configured and available for model serving workloads. + +[id="maas-quickstart-governance-layer"] +== Governance layer + +The governance layer controls access and enforces usage policies: + +OpenShift OAuth with htpasswd:: +Provides identity and access management using the built-in OAuth server in OpenShift with htpasswd credentials. The solution assigns users to tiers (Free, Premium, Enterprise) that determine their rate limits and model access. + +Kuadrant (Red Hat Connectivity Link):: +Enforces rate limit policies on inference API requests. Each user tier has a configured request quota (Free: 5/2min, Premium: 20/2min, Enterprise: 50/2min) to manage capacity and enable usage-based chargeback. + +{hashicorp-vault} and External Secrets Operator:: +Manages sensitive credentials including htpasswd user passwords. The {solution-name-upstream} framework provisions {hashicorp-vault-short} and ESO to securely synchronize secrets to the cluster. + +[id="maas-quickstart-developer-experience"] +== Developer experience layer + +The developer experience layer provides the end-user interface: + +OpenShift DevSpaces:: +Delivers browser-based developer workspaces with preconfigured IDE environments. Developers access DevSpaces to write code with AI assistance without local setup. + +Continue AI extension:: +An open source AI code assistant extension integrated into DevSpaces. Continue connects to the vLLM inference endpoints to provide inline code suggestions, completions, and chat-based code assistance. + +[id="maas-quickstart-deployment"] +== Deployment architecture + +The following table describes the pod structure when you deploy on OpenShift: + +[cols="1,2,3",options="header"] +|=== +| Pod | Purpose | Characteristics + +| vLLM Model Server (nemotron-3-nano-30b) +| Code generation inference +| GPU-accelerated, serves premium and enterprise tier users, managed by llm-d and {rhoai} + +| vLLM Model Server (gpt-oss-20b) +| Code generation inference +| GPU-accelerated, serves all user tiers, managed by llm-d and {rhoai} + +| Kuadrant / Limitador +| API rate limiting +| Enforces per-tier rate limits on inference endpoints, provides usage metrics + +| DevSpaces +| Developer IDE +| Browser-based workspaces with Continue AI extension, connects to inference endpoints + +| Grafana +| Monitoring dashboards +| Visualizes inference metrics, request rates, and per-tier usage + +| Prometheus +| Metrics collection +| Collects inference latency, throughput, GPU utilization, and rate limiting metrics + +| Vault +| Secrets management +| Stores htpasswd credentials and other sensitive configuration, synced by ESO +|=== + +[id="maas-quickstart-technologies"] +== Implementation technologies + +[cols="1,2",options="header"] +|=== +| Component | Technology + +| Inference Engine +| vLLM with llm-d + +| Language Models +| NVIDIA Nemotron (nemotron-3-nano-30b-a3b-fp8, gpt-oss-20b) + +| Container Orchestration +| {rh-ocp} + {rhoai} + +| IDE Platform +| Red Hat OpenShift DevSpaces + Continue + +| API Gateway / Rate Limiting +| Red Hat Connectivity Link (Kuadrant) + +| Identity Management +| OpenShift OAuth with htpasswd + +| GPU Management +| NVIDIA GPU Operator + +| Monitoring +| Grafana + Prometheus + +| Certificate Management +| cert-manager + +| Secrets Management +| HashiCorp Vault + External Secrets Operator + +| Inference Orchestration +| LeaderWorkerSet (LWS) Operator +|=== diff --git a/static/images/maas-quickstart/code-assist-diagram.png b/static/images/maas-quickstart/code-assist-diagram.png new file mode 100644 index 000000000..3cfbf1b8b Binary files /dev/null and b/static/images/maas-quickstart/code-assist-diagram.png differ