Replace
github.comwithgraphhub.devin any GitHub repo URL to get an interactive knowledge graph of that codebase. Zero setup, zero auth required.
https://github.com/vercel/next.js → https://graphhub.dev/vercel/next.js
GraphHub fetches a GitHub repository, parses every source file using language-specific AST-based parsers, and renders the entire codebase as an interactive force-directed graph — where nodes are folders, files, functions, and classes, and edges are imports, exports, and containment relationships.
The result is a live, draggable, zoomable knowledge graph that lets you understand an unfamiliar codebase in minutes instead of hours.
flowchart TD
Browser["Browser - Next.js Client"]
Landing["Landing Page /"]
GraphView["Graph View /owner/repo"]
API_Graph["/api/graph/owner/repo"]
Pipeline["Pipeline - getParsedData()"]
GH["GitHub REST API v3"]
Cache["Server Cache - node-cache"]
LStore["Client Cache - localStorage 30min"]
Sim["D3 Force Simulation"]
Canvas["Canvas Renderer - 60fps rAF"]
Browser -->|URL input| Landing
Landing -->|navigate| GraphView
GraphView -->|cache miss| API_Graph
GraphView -->|cache hit| LStore
API_Graph --> Pipeline
Pipeline -->|tree key| Cache
Pipeline -->|file key| Cache
Pipeline -->|parsed key| Cache
Cache -->|miss| GH
GH -->|file tree + blobs| Pipeline
Pipeline -->|ParsedFile array| API_Graph
API_Graph -->|GraphData JSON| GraphView
GraphView --> Sim
Sim -->|tick positions| Canvas
flowchart LR
A["GET /api/graph\n/:owner/:repo"] --> B["Resolve HEAD SHA\nGitHub commits API"]
B --> C["Fetch file tree\nGitHub trees API\nrecursive"]
C --> D["Filter files\nshouldIncludeFile()"]
D --> E["Batch fetch blobs\nGitHub contents API"]
E --> F["Language parsers\nJS · TS · Python · Go"]
F --> G["buildGraph()\nNodes + Edges + Clusters"]
G --> H["GraphData JSON\nto client"]
H --> I["D3 forceSimulation\nphysics layout"]
I --> J["Canvas 2D render\nrAF loop"]
flowchart TD
Req["Incoming request\n/api/graph/:owner/:repo"] --> L1{"Server cache\nnode-cache"}
L1 -->|HIT| Resp["Return cached GraphData"]
L1 -->|MISS| L2{"Parsed cache\nparsedKey"}
L2 -->|HIT| BG["buildGraph() only"]
L2 -->|MISS| L3{"Tree + file cache\ntreeKey · fileKey"}
L3 -->|partial HIT| Fetch["Fetch only uncached files"]
L3 -->|MISS| GH["GitHub REST API"]
GH --> Parse["Parse + cache each layer"]
Fetch --> Parse
BG --> Resp
Parse --> Resp
Resp --> Client["Client\nlocalStorage · 30 min TTL"]
| Layer | Technology |
|---|---|
| Framework | Next.js 14 (App Router) |
| Language | TypeScript (strict) |
| Styling | Tailwind CSS + CSS custom properties |
| Graph physics | D3.js v7 force simulation |
| Rendering | Canvas 2D API |
| Server cache | node-cache (in-process, 500-key cap) |
| Client cache | localStorage with 30-min TTL |
| Data source | GitHub REST API v3 |
| Icons | Lucide React |
| Fonts | Geist |
| Deployment | Vercel |
Rendering a large codebase graph — potentially thousands of nodes and edges — in SVG means thousands of DOM elements, each triggering layout recalculations on every simulation tick. At that scale, the browser cannot maintain 60fps.
GraphHub uses the Canvas 2D API with a requestAnimationFrame loop instead. The renderer redraws the entire scene each frame from scratch: edges first, then nodes, then labels. This keeps the DOM to a single <canvas> element regardless of graph size, and frame time stays sub-16ms even for repos with 2000+ nodes.
The tradeoff is that hit-testing (hover, click) requires a quadtree built over node positions rather than native DOM events. A D3 quadtree is rebuilt on every tick and queried on every mouse move — the overhead is negligible compared to DOM event bubbling at this scale.
D3's force simulation provides physically-plausible layouts without requiring manual coordinate assignment. Three forces drive the layout:
- Many-body (charge) — repulsion between nodes, scaled by type: folders repel hardest (
-1400), files moderately (-800), symbols lightly (-500). This naturally separates clusters without explicit grouping logic. - Link — spring attraction along edges, tuned per edge type:
containsedges are shorter (120px, strength 0.45),importedges are longer (180px, strength 0.2). - Cluster centroid — a custom radial force pulls each node toward its cluster's computed centroid, grouping files from the same top-level folder without hard positional constraints.
The simulation runs until alpha drops below 0.001, then idles — no wasted CPU between interactions. Reheating on node drag keeps the experience responsive.
GitHub API rate limits (5000 req/hr authenticated, 60/hr unauthenticated) are the primary constraint. The pipeline caches at three independent layers, each keyed by owner + repo + SHA:
- Tree layer — the recursive file tree. TTL: 1 hour. Changes only when new commits land.
- File layer — individual file blobs. TTL: 24 hours. File content rarely changes within a day, and the SHA in the key ensures correctness when it does.
- Parsed layer — the full
ParsedFile[]output. TTL: 24 hours. Avoids re-running AST parsing on cache-warm requests.
A fourth client-side cache in localStorage (30-min TTL) means navigating back to a repo you already loaded renders instantly from local state — zero network round trips.
Cache keys use \x00 as a separator to prevent collision across key components (e.g., an owner named a:b vs. owner a with repo b).
Each language has a dedicated parser that walks the source file and extracts imports, exports, functions, and classes. Using structural patterns rather than naive regex means edge cases like multi-line imports, re-exports, string literals containing import, and decorator-annotated classes are handled correctly. The Go parser handles import () blocks; the Python parser handles from x import y, z aliasing; the JS/TS parser handles both CommonJS require() and ESM import.
Files that fail to parse degrade gracefully — they appear as file nodes in the graph without symbol children, and parse errors are surfaced in the API response metadata.
There is no explicit "navigate to repo" UI beyond a text input. The URL scheme /[owner]/[repo] mirrors GitHub exactly — which is intentional. Users who discover the site naturally substitute graphhub.dev for github.com and it just works. This makes the tool linkable and shareable without any state serialization.
The server rotates across GITHUB_TOKEN and GITHUB_TOKEN_2 on 403/429 responses, maximizing effective rate limit headroom. Adding tokens to .env.local linearly scales throughput — no other changes required.
src/
├── app/
│ ├── [owner]/[repo]/ # Graph view — canvas + sidebar + controls
│ ├── api/
│ │ ├── graph/[owner]/[repo]/ # Main graph endpoint
│ │ ├── parse/[owner]/[repo]/ # Parse-only endpoint
│ │ ├── tree/[owner]/[repo]/ # File tree endpoint
│ │ └── files/[owner]/[repo]/ # File content endpoint
│ ├── showcase/ # Engineering case study page
│ └── page.tsx # Landing page
├── components/ # Shared UI components
├── hooks/
│ ├── useGraph.ts # Simulation + render + interaction logic
│ ├── useSearch.ts # Node search filtering
│ ├── useTheme.ts # Dark/light mode
│ └── useZoom.ts # D3 zoom + pan
├── lib/
│ ├── cache.ts # node-cache wrapper + TTLs + key builders
│ ├── github.ts # GitHub REST API client with token rotation
│ ├── pipeline.ts # Three-layer fetch + parse pipeline
│ ├── graph/
│ │ ├── buildGraph.ts # ParsedFile[] → GraphData (nodes, edges, clusters)
│ │ ├── simulation.ts # D3 force simulation factory
│ │ └── renderer.ts # Canvas 2D renderer + quadtree hit-testing
│ └── parser/
│ ├── javascript.ts # JS + TS parser
│ ├── python.ts # Python parser
│ └── golang.ts # Go parser
└── types/index.ts # All shared types
git clone https://github.com/dhananjay6561/GraphHub
cd GraphHub
cp .env.example .env.localEdit .env.local — tokens are optional but recommended (without them you hit the 60 req/hr unauthenticated limit):
GITHUB_TOKEN=ghp_your_token_here
GITHUB_TOKEN_2=ghp_your_second_token_here # optional, for higher throughputnpm install
npm run devOpen http://localhost:3000 and paste any GitHub repo URL.
| Language | Extensions | Parsed constructs |
|---|---|---|
| JavaScript | .js, .jsx, .mjs, .cjs |
imports, exports, functions, classes |
| TypeScript | .ts, .tsx |
imports, exports, functions, classes, interfaces |
| Python | .py |
imports, functions, classes |
| Go | .go |
imports, functions, types |
| Type | Represents |
|---|---|
folder |
Top-level directory |
file |
Source file |
function |
Function or method definition |
class |
Class or interface definition |
| Type | Represents |
|---|---|
import |
File A imports from File B |
contains |
File contains a function or class node |
export |
Re-export relationship |
MIT