Skip to content

utilizing concurrency #8

Description

@Sentdex

With pipeline parallelism and especially tensor parallelism, a lot of throughput performance is being left on the table by not solving any task that could be broken down into multiple pieces and solved in parallel.

Want to come up with a good way to utilize this extra perf, probably with some sort of toggle for max concurrency (default 1) and let the model div up tasks this way.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions