Traverse-Research · jasperdewinther · Mar 13, 2025 · Mar 17, 2025 · Mar 17, 2025 · Mar 17, 2025
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,4 @@
+scripts/.ipynb_checkpoints/
+scripts/output_analysis/
+scripts/output_per_frame/
+scripts/output_scores/
diff --git a/README.md b/README.md
@@ -8,26 +8,55 @@
 
 </div>
 
-## 👷‍♀️ Requirements
+## 📊 Available tools
 
-Please ensure the following dependencies are installed before running scripts from this repository:
+This repository contains multiple scripts which can be run as is for reporting purposes, or serve as a baseline/guide for custom analysis scripts. Below we list all tools, with example outputs below. We also mention which command line argument should be passed to Evolve to generate the requested graphs.
+
+- **`run_on_android.py`** Launches the Evolve benchmark on a connected Android device over ADB and pulls the result files back to your machine. This script requires ADB (Android Debug Bridge) to be installed.
+- **`compare_deep_analysis.py`** CLI tool that compares two deep-analysis exports and writes the top 20 passes with the largest mean / standard-deviation differences to CSV. Requires `--export-deep-analysis`.
+- **`scores_plotter.ipynb`** Bar chart comparing the Evolve scores per metric across benchmark runs or devices. Requires `--export-scores`.
+
+  <img src="./docs/images/tools/scores_plotter.png" width="700">
+
+- **`per_frame_plotter.ipynb`** Line graph of the per score bucketed execution time/metric (frame time, ray tracing, rasterization, compute, driver, …) throughout the benchmark timeline. Can be used to compare between runs. Requires `--export-per-frame`.
 
-- Python 3.7
-- ADB (Android Debug Bridge)
+  <img src="./docs/images/tools/per_frame_plotter.png" width="700">
+  <img src="./docs/images/tools/per_frame_metric.png" width="700">
 
+- **`deep_analysis_plotter.ipynb`** Line graph of the execution time per render/compute/rt pass. Can be used to compare between runs. Requires `--export-deep-analysis`.
 
-## 📊 Comparing deep analysis output
+  <img src="./docs/images/tools/deep_analysis_plotter.png" width="700">
 
-Using the `compare_deep_analysis.py` script located in the `scripts` directory, you can compare the results of two separate deep analysis output files in multiple ways. For the analysis methods, the scripts will first do an attempt to average over all loop iterations of the output. If you ran Evolve with `--looping 5`, each frame in the output will use the mean from each frame from each of the Evolve benchmark iterations.
+- **`frame_breakdown_stackplot.ipynb`** Stacked area chart breaking down each frame's GPU time by render/compute/rt pass (top 20 passes + other). Requires `--export-deep-analysis`.
 
-## Additional Requirements
-In addition to the requirements listed in [Requirements](#👷‍♀️-requirements), this script also requires the use of several python packages. These can be installed as follows:
+  <img src="./docs/images/tools/frame_breakdown_stackplot.png" width="700">
+
+## 📏 Capturing data
+
+These scripts require specific custom Evolve outputs, which can be generated using the following command line arguments when launching evolve. 
+
+ - `run-custom --export-scores scores.csv`
+ - `run-custom --export-per-frame per_frame.csv`
+ - `run-custom --export-deep-analysis deep_analysis.json`
+
+## 👷‍♀️ Requirements
+
+Please ensure the following dependencies are installed before running scripts from this repository:
+
+- Python 3.11+
+- ADB (Android Debug Bridge), only required for `run_on_android.py`
+
+The scripts and notebooks depend on several Python packages, which can be installed with:
 
 ```sh
 python -m pip install -r requirements.txt
 ```
 
-## Usage
+## 📊 Using  `compare_deep_analysis.py`
+
+Using the `compare_deep_analysis.py` script, you can compare the results of two separate deep analysis output files in multiple ways. For the analysis methods, the scripts will first do an attempt to average over all loop iterations of the output. If you ran Evolve with `--looping 5`, each frame in the output will use the mean from each frame from each of the Evolve benchmark iterations.
+
+### Usage
 ```sh
 usage: Evolve Deep Analysis Comparison [-h] [--pass_mean_comparison PASS_MEAN_COMPARISON] [--pass_stdev_comparison PASS_STDEV_COMPARISON] deep_analysis_file deep_analysis_file
 ```
@@ -43,4 +72,4 @@ At least one of `--pass_mean_comparison` or `--pass_stdev_comparison` arguments
 ### Example usage
 ```sh
 python compare_deep_analysis.py --pass_mean_comparison mean_comparison.csv --pass_stdev_comparison stdev_comparison.csv deep_analysis_gpu_1.json deep_analysis_gpu_2.json
-```
+```
diff --git a/docs/images/tools/deep_analysis_plotter.png b/docs/images/tools/deep_analysis_plotter.png
diff --git a/docs/images/tools/frame_breakdown_stackplot.png b/docs/images/tools/frame_breakdown_stackplot.png
diff --git a/docs/images/tools/per_frame_metric.png b/docs/images/tools/per_frame_metric.png
diff --git a/docs/images/tools/per_frame_plotter.png b/docs/images/tools/per_frame_plotter.png
diff --git a/docs/images/tools/scores_plotter.png b/docs/images/tools/scores_plotter.png
diff --git a/scripts/deep_analysis_plotter.ipynb b/scripts/deep_analysis_plotter.ipynb
@@ -0,0 +1,169 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aadf8bdc-2b8a-4737-992f-0ebf7e66cd8f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Install dependencies into the active kernel\n",
+    "%pip install flatten_json pandas matplotlib"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bbdc923a-44fd-4250-990b-80771446bbf0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import json\n",
+    "from collections import defaultdict\n",
+    "from flatten_json import flatten\n",
+    "import matplotlib.pyplot as plt\n",
+    "import pathlib"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "59b6a968-ad86-48a0-8948-d1f42bf0c3c6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "files = [\n",
+    "    'a_deep_analysis.json',\n",
+    "    'b_deep_analysis.json',\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a60a4c63-b884-4558-98bd-ee3d14bbc569",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def aggregate_loops_passes(loops):\n",
+    "    results_per_frame = []\n",
+    "    num_loops = len(loops)\n",
+    "    for loop_results in loops:\n",
+    "        for frame_index, frame_results in enumerate(loop_results[\"per_frame_results\"]):\n",
+    "            if frame_index >= len(results_per_frame):\n",
+    "                results_per_frame.append(defaultdict(int))\n",
+    "            results_per_frame[frame_index]['sequence_time_ns'] = frame_results['sequence_time_ns']\n",
+    "            for command_buffer_timings in frame_results[\"command_buffer_timings\"].values():\n",
+    "                for scope_name, scope_timings in command_buffer_timings[\"scope_timings\"].items():\n",
+    "                    # A pass can run multiple times per frame; sum its durations, average over loops.\n",
+    "                    for scope_timing in scope_timings:\n",
+    "                        results_per_frame[frame_index][scope_name] += (\n",
+    "                            scope_timing[\"end\"] - scope_timing[\"start\"]\n",
+    "                        ) / num_loops / 1_000_000  # in ms\n",
+    "            if frame_results[\"metrics\"] is not None:\n",
+    "                for metric_name, metric in frame_results[\"metrics\"].items():\n",
+    "                    # TODO: Flatten this in rust to fan_speed_rpm\n",
+    "                    if metric is not None and metric_name != \"timestamp\":\n",
+    "                        results_per_frame[frame_index][metric_name] += metric / num_loops\n",
+    "    # TODO: Aggregate CPU timings\n",
+    "    return pd.DataFrame([flatten(x) for x in results_per_frame])\n",
+    "\n",
+    "\n",
+    "# Load every input file, aggregating its loops/passes into one number per pass per frame.\n",
+    "results = {}\n",
+    "for path in files:\n",
+    "    with open(path, \"r\") as json_file:\n",
+    "        results[path] = aggregate_loops_passes(json.load(json_file))\n",
+    "\n",
+    "# Concat into one frame: (input file, frame) per row, metric per column\n",
+    "full_dataset = pd.concat(results)\n",
+    "full_dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7de3e2b9-0470-40a2-9ed7-c2d931aaf153",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "# Print all possible metrics\n",
+    "full_dataset.columns.tolist()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6046a029-f421-4a9e-a9d6-9889488ff874",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "metrics = full_dataset\n",
+    "\n",
+    "# Reshape into sequence time + metric type per row, input file per column\n",
+    "metrics = metrics.reset_index().set_index(['sequence_time_ns', 'level_0']).drop('level_1', axis=1)\n",
+    "metrics = metrics.stack().unstack(1).reset_index()\n",
+    "\n",
+    "# From ns to s\n",
+    "metrics['sequence_time_s'] = metrics['sequence_time_ns'] / 1_000_000_000\n",
+    "metrics = metrics.drop('sequence_time_ns', axis=1)\n",
+    "\n",
+    "# Drop the leftover index name so it doesn't show up as a \"level_0\" legend title\n",
+    "metrics.columns.name = None\n",
+    "metrics"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e6740136-668a-44a6-9e35-53f80d586ea5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pathlib.Path('output_analysis').mkdir(parents=True, exist_ok=True)\n",
+    "\n",
+    "# One image per metric/pass: compares all input files over the benchmark timeline.\n",
+    "for graph_name in metrics['level_1'].unique():\n",
+    "    # Grab just this metric, leaving one column per input file plus the timeline\n",
+    "    selected_metric = metrics[metrics['level_1'] == graph_name].drop('level_1', axis=1)\n",
+    "\n",
+    "    ax = selected_metric.infer_objects(copy=False).interpolate(method='linear').plot(\n",
+    "        x='sequence_time_s',\n",
+    "        xlabel='benchmark timeline in seconds',\n",
+    "        ylabel='shader execution time in ms',\n",
+    "        title=graph_name,\n",
+    "        figsize=(20, 10),\n",
+    "        colormap='Dark2',\n",
+    "        grid=True,\n",
+    "        legend=True,\n",
+    "    )\n",
+    "    ax.figure.savefig(f'output_analysis/{graph_name}.png')\n",
+    "    plt.close(ax.figure)  # free the figure so we don't keep ~94 open at once\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}