gpu-benchmarking

Here are 10 public repositories matching this topic...

Cre4T3Tiv3 / jetson-orin-matmul-analysis

CUDA matrix multiplication benchmarking on Jetson Orin Nano. Four implementations, three power modes, five matrix sizes. 99.5% mathematical validation. C++/CUDA and Python.

Updated Apr 2, 2026
Python

lokeshpuma / Deep_Learning

Star

Hands-on Jupyter notebooks for deep learning with TensorFlow, covering fundamental concepts, model training, and applied tabular projects.

machine-learning deep-learning tensorflow jupyter-notebook neural-networks tensorboard gradient-descent gpu-benchmarking

Updated May 29, 2026
Jupyter Notebook

kevinbazira / llm-rocm-benchmarks

Star

Standalone LLM inference benchmarking pipelines on AMD GPUs using ROCm, vLLM, MAD, and data visualization scripts.

performance-engineering machine-learning rocm model-serving amd-gpu mlops inference-optimization llm vllm llm-inference llm-benchmarking gpu-benchmarking

Updated Feb 21, 2026
Python

ZrobMiloudaa / jetson-orin-matmul-analysis

Star

🔍 Analyze CUDA matrix multiplication performance and power consumption on NVIDIA Jetson Orin Nano across multiple implementations and settings.

machine-learning robotics cuda cublas matrix-multiplication high-performance-computing gpu-computing performance-optimization autonomous-systems edge-computing nvidia-jetson embeded-systems tensor-cores ml-deployment jetson-orin-nano gpu-benchmarking power-efficiency-benchmark cuda-optimization

Updated May 30, 2026
Python

FluidNumerics / gpu-microbenchmarks

Star

gpu benchmarks gpu-benchmarking

Updated Jan 19, 2022
C++

tdiprima / run_system_checks

Star

One-shot script to audit GPU, CUDA, PyTorch, CPU, and disk performance before debugging a slow or broken ML environment.

machine-learning cuda pytorch system-diagnostics gpu-benchmarking

Updated Apr 3, 2026
Shell

saminkhan1 / llm-serving-benchmark-lab

Star

Artifact-backed LLM serving performance lab for vLLM baselines, official metrics, GuideLLM checks, and SGLang/PD scaffolding

python performance-engineering modal prometheus artifact-evaluation llm llm-serving vllm llm-inference sglang llm-performance gpu-benchmarking guidellm inference-benchmarking serving-metrics

Updated May 21, 2026
Python

kadamrahul18 / GPT2-Optimization

Star

GPT-2 (124M) fixed-work distributed training benchmark on NYU BigPurple (Slurm) scaling 1→8× V100 across 2 nodes using DeepSpeed ZeRO-1 + FP16/AMP. Built a reproducible harness that writes training_metrics.json + RUN_COMPLETE.txt + launcher metadata per run, plus NCCL topology/log artifacts and Nsight Systems traces/summaries (NVTX + NCCL ranges).

performance hpc amp slurm pytorch reproducibility distributed-training mixed-precision gpt2 deepspeed zero-1 gpu-benchmarking

Updated Apr 17, 2026
Python

Tennisee-data / benchHUB

Star

benchHUB is a Python-based project to parse, aggregate, and visualize system and performance benchmarks. It includes a Streamlit dashboard to display and compare results.

mac benchmarking data-science machine-learning hardware leaderboard gpu-computing leaderboards performance-testing gpu-benchmark fastapi streamlit benchmarking-utility apple-silicon cpu-benchmarks gpu-benchmarking

Updated May 26, 2026
Python

Kretski / ScalePredict

Sponsor

Star

Run a 2-min local benchmark → predict how long your AI job will take on cloud GPU (T4/V100/A100). No guessing, no wasted money.

machine-learning cloud ai-runtime cloud-cost-optimization gpu-benchmarking

Updated Mar 19, 2026
Python

Improve this page

Add a description, image, and links to the gpu-benchmarking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu-benchmarking topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu-benchmarking

Here are 10 public repositories matching this topic...

Cre4T3Tiv3 / jetson-orin-matmul-analysis

lokeshpuma / Deep_Learning

kevinbazira / llm-rocm-benchmarks

ZrobMiloudaa / jetson-orin-matmul-analysis

FluidNumerics / gpu-microbenchmarks

tdiprima / run_system_checks

saminkhan1 / llm-serving-benchmark-lab

kadamrahul18 / GPT2-Optimization

Tennisee-data / benchHUB

Kretski / ScalePredict

Improve this page

Add this topic to your repo