Concerns about TensorRT 11 mixed-precision workflow

I understand the motivation behind strong typing, but the TensorRT 11 FP16 migration raises a few concerns.

## 1. Loss of hardware-aware precision optimization

In TensorRT 10, enabling FP16 was an optimization opportunity, not a hard requirement. TensorRT could choose FP16 or FP32 implementations based on the target GPU, available tactics, and measured performance.

With TensorRT 11, the recommended workflow is to transform the ONNX graph beforehand (e.g. with ModelOpt AutoCast), making FP32/FP16 decisions before TensorRT sees the model.

This seems to move precision assignment away from the component that actually knows the target hardware and may therefore reduce optimization opportunities that previously existed in TensorRT.

## 2. Reduced ONNX portability

Previously, a single FP32 ONNX model could be optimized differently depending on the deployment target.

Now, precision policy is embedded into the ONNX graph itself through explicit FP16 types and casts. This feels at odds with the idea of ONNX as a hardware-agnostic model representation, especially when the eventual deployment target may not be known at export time (e.g. running the model on a CPU).

## 3. Significantly increased dependency footprint

In TensorRT 10, mixed-precision optimization was built into TensorRT.

In TensorRT 11, the recommended replacement requires installing ModelOpt, which in turn brings in substantial dependencies including, including `onnxruntime`, `onnxruntime-gpu`, `torch`.

As a result, users who only need inference must now install a full deep learning framework and an additional inference stack just to regain functionality that previously existed inside TensorRT itself.

Is there any plan to provide a lightweight TensorRT-native alternative for mixed-precision graph transformation, or to restore some form of hardware-aware precision optimization during engine building?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concerns about TensorRT 11 mixed-precision workflow #4807

1. Loss of hardware-aware precision optimization

2. Reduced ONNX portability

3. Significantly increased dependency footprint

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Concerns about TensorRT 11 mixed-precision workflow #4807

Description

1. Loss of hardware-aware precision optimization

2. Reduced ONNX portability

3. Significantly increased dependency footprint

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions