Skip to content

Concerns about TensorRT 11 mixed-precision workflow #4807

@pwuertz

Description

@pwuertz

I understand the motivation behind strong typing, but the TensorRT 11 FP16 migration raises a few concerns.

1. Loss of hardware-aware precision optimization

In TensorRT 10, enabling FP16 was an optimization opportunity, not a hard requirement. TensorRT could choose FP16 or FP32 implementations based on the target GPU, available tactics, and measured performance.

With TensorRT 11, the recommended workflow is to transform the ONNX graph beforehand (e.g. with ModelOpt AutoCast), making FP32/FP16 decisions before TensorRT sees the model.

This seems to move precision assignment away from the component that actually knows the target hardware and may therefore reduce optimization opportunities that previously existed in TensorRT.

2. Reduced ONNX portability

Previously, a single FP32 ONNX model could be optimized differently depending on the deployment target.

Now, precision policy is embedded into the ONNX graph itself through explicit FP16 types and casts. This feels at odds with the idea of ONNX as a hardware-agnostic model representation, especially when the eventual deployment target may not be known at export time (e.g. running the model on a CPU).

3. Significantly increased dependency footprint

In TensorRT 10, mixed-precision optimization was built into TensorRT.

In TensorRT 11, the recommended replacement requires installing ModelOpt, which in turn brings in substantial dependencies including, including onnxruntime, onnxruntime-gpu, torch.

As a result, users who only need inference must now install a full deep learning framework and an additional inference stack just to regain functionality that previously existed inside TensorRT itself.

Is there any plan to provide a lightweight TensorRT-native alternative for mixed-precision graph transformation, or to restore some form of hardware-aware precision optimization during engine building?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions