Skip to content

Commit fd8f05c

Browse files
authored
Expand README for src/metrics (#58886)
1 parent 40b1b6c commit fd8f05c

1 file changed

Lines changed: 189 additions & 55 deletions

File tree

src/metrics/README.md

Lines changed: 189 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,76 +1,210 @@
1-
# Kusto tooling
1+
# Metrics
22

3-
CLI tools to fetch data from the Kusto API.
3+
The metrics subject provides CLI tools for fetching analytics data from Kusto (Azure Data Explorer) about GitHub Docs usage. These tools help content strategists, writers, and engineers understand page performance, user behavior, and content effectiveness.
44

5-
## Installation and authentication
5+
## Purpose & Scope
66

7-
1. Install the Azure CLI with `brew install azure-cli`.
8-
* If you have the option to **not** update all your brew packages, choose that, or it will take a really long time.
9-
1. Run `az login`.
10-
* You'll have to run `az login` whenever your session expires. The sessions are fairly long lasting.
11-
1. Enter your `<username>@githubazure.com` credentials.
12-
* These will get cached for future logins.
13-
1. At the prompt in Terminal asking which subscription you want to use, just press Enter to choose the default.
14-
1. Open or create an `.env` file in the root directory of your checkout (this file is already in `.gitignore` so it won't be tracked by Git).
15-
1. Add the `KUSTO_CLUSTER` and `KUSTO_DATABASE` values to the `.env` (_these values are pinned in slack_):
16-
```
17-
KUSTO_CLUSTER='<value>'
18-
KUSTO_DATABASE='<value>'
19-
```
7+
This subject is responsible for:
8+
- Providing CLI tools to query Kusto for docs analytics
9+
- `docstat` - Get metrics for a single URL (views, users, bounces, etc.)
10+
- `docsaudit` - Get metrics for an entire content directory
11+
- Kusto query abstractions for common metrics
12+
- Authentication and connection to Azure Kusto
13+
- Date range calculations for time-series queries
2014

21-
## docstat usage
15+
## Architecture & Key Assets
2216

23-
Run `npm run docstat -- <URL>` on any GitHub Docs URL to gather a set of default metrics about it, including 30d views, users, view duration, bounces, helpfulness score, and exits to support.
17+
### Key capabilities and their locations
2418

25-
Notes:
26-
* If the URL doesn't include a version, `docstat` will return data that includes **all versions** (so FPT, Cloud, Server, etc.).
27-
* If you want data for FPT only, pass the `--fptOnly` option.
28-
* `docstat` only accepts URLs with an `en` language code or no language code, and it only fetches English data.
19+
- `lib/kusto-client.ts` - `getKustoClient()`: Creates authenticated Kusto client using Azure CLI
20+
- `lib/kusto-client.ts` - `runQuery()`: Executes Kusto queries and returns results
21+
- `scripts/docstat.ts` - CLI tool: Fetches metrics for a single docs URL
22+
- `scripts/docsaudit.ts` - CLI tool: Audits entire content directories with CSV output
23+
- `queries/*.ts` - Pre-defined Kusto queries for specific metrics
2924

30-
To see all the options:
31-
```
32-
npm run docstat -- --help
33-
```
34-
You can combine options like this:
35-
```
36-
npm run docstat -- https://docs.github.com/copilot/tutorials/modernize-legacy-code --compare --range 60
37-
```
38-
Use `--redirects` to include `redirect_from` frontmatter paths in the queries (this is helpful if the article may have moved recently):
39-
```
40-
npm run docstat -- https://docs.github.com/copilot/tutorials/modernize-legacy-code --redirects
41-
```
42-
Use the `--json` (or `-j`) option to output JSON:
43-
```
44-
npm run docstat -- https://docs.github.com/copilot/tutorials/modernize-legacy-code --json
45-
```
46-
If you want to pass the results of the JSON to `jq`, you need to use `silent` mode:
47-
```
48-
npm run --silent docstat -- https://docs.github.com/copilot/tutorials/modernize-legacy-code --json | jq .data.users
49-
```
25+
## Setup & Usage
26+
27+
### Installation and authentication
28+
29+
1. Install Azure CLI:
30+
```bash
31+
brew install azure-cli
32+
```
33+
34+
2. Login with Azure credentials:
35+
```bash
36+
az login
37+
```
38+
Use your `<username>@githubazure.com` credentials.
5039

51-
## docsaudit usage
40+
3. Add Kusto configuration to `.env` file (values pinned in Slack):
41+
```
42+
KUSTO_CLUSTER='<value>'
43+
KUSTO_DATABASE='<value>'
44+
```
5245

53-
Run `npm run docsaudit` on a top-level content directory to gather data about its files—including title, path, versions, 30d views, and 30d users—and output it to a CSV file.
46+
### docstat usage
5447

55-
To see all the options:
48+
Get metrics for a single URL:
49+
50+
```bash
51+
npm run docstat -- <URL>
5652
```
57-
npm run docsaudit -- --help
53+
54+
Example:
55+
```bash
56+
npm run docstat -- https://docs.github.com/copilot/tutorials/modernize-legacy-code
5857
```
59-
Run the script on any top-level content directory:
58+
59+
Default metrics returned:
60+
- 30-day views
61+
- 30-day unique users
62+
- Average view duration
63+
- Bounce rate
64+
- Helpfulness score (survey data)
65+
- Exits to support
66+
67+
#### Options
68+
69+
```bash
70+
# Compare with previous period
71+
npm run docstat -- <URL> --compare
72+
73+
# Custom date range (60 days)
74+
npm run docstat -- <URL> --range 60
75+
76+
# Include redirects from frontmatter
77+
npm run docstat -- <URL> --redirects
78+
79+
# FPT data only (default includes all versions)
80+
npm run docstat -- <URL> --fptOnly
81+
82+
# JSON output
83+
npm run docstat -- <URL> --json
84+
85+
# Combine options
86+
npm run docstat -- <URL> --compare --range 60 --redirects
6087
```
61-
npm run docsaudit -- <content directory name>
88+
89+
#### JSON output with jq
90+
91+
```bash
92+
npm run --silent docstat -- <URL> --json | jq .data.users
6293
```
63-
For example:
94+
95+
### docsaudit usage
96+
97+
Audit an entire content directory:
98+
99+
```bash
100+
npm run docsaudit -- <content-directory>
64101
```
102+
103+
Example:
104+
```bash
65105
npm run docsaudit -- actions
66106
```
67107

68-
## Future development
108+
Output includes:
109+
- Title
110+
- Path
111+
- Versions
112+
- 30-day views
113+
- 30-day unique users
114+
115+
Results are saved to a CSV file in the project root.
116+
117+
## Data & External Dependencies
118+
119+
### Data sources
120+
- Kusto (Azure Data Explorer) - GitHub's data warehouse for analytics
121+
- Docs event data - Page views, user interactions, surveys
122+
- Content frontmatter - For path resolution and redirect detection
123+
124+
### Dependencies
125+
- `azure-kusto-data` - Official Azure Kusto SDK
126+
- Azure CLI - For authentication (`az login`)
127+
- Environment variables: `KUSTO_CLUSTER`, `KUSTO_DATABASE`
128+
129+
### Authentication
130+
- Uses Azure CLI identity via `withAzLoginIdentity()`
131+
- Sessions are long-lasting but expire periodically
132+
- Re-run `az login` when session expires
133+
134+
### Queries
135+
Pre-defined queries in `queries/` directory:
136+
- `views.ts` - Total page views
137+
- `users.ts` - Unique users
138+
- `view-duration.ts` - Average session duration
139+
- `bounces.ts` - Percentage of single-page sessions
140+
- `survey-score.ts` - Helpfulness rating from surveys
141+
- `exits-to-support.ts` - Clicks on support links
142+
143+
## Cross-links & Ownership
144+
145+
### Related subjects
146+
- [`src/events`](../events/README.md) - Source of analytics event data
147+
- [`src/frame`](../frame/README.md) - Frontmatter reading for path resolution
148+
- Kusto database - Contains aggregated event data
149+
150+
### Internal documentation
151+
For Kusto cluster details and database schema, see internal Docs Engineering documentation. Credentials are pinned in the #docs-engineering Slack channel.
152+
153+
### Ownership
154+
- Team: Docs Content (with engineering support and reviews)
155+
- Data questions: #docs-data
156+
157+
## Current State & Next Steps
158+
159+
### Known limitations
160+
- Date range only accepts start date (end date is always current)
161+
- Only English (`en`) language data is supported
162+
- Queries are hardcoded in `queries/` directory
163+
- URLs without version include all versions (FPT, GHEC, GHES combined)
164+
165+
### Metrics available
166+
Current metrics:
167+
- Views (page view count)
168+
- Users (unique user count)
169+
- View duration (average time on page)
170+
- Bounces (single-page sessions)
171+
- Survey score (helpfulness rating)
172+
- Exits to support (support link clicks)
173+
174+
### Adding a new query
175+
176+
1. Create new file in `src/metrics/queries/`
177+
2. Export a function that returns a Kusto query string
178+
3. Import and call in `docstat.ts` or `docsaudit.ts`
179+
4. Update CLI options if needed
180+
181+
Example:
182+
```typescript
183+
// queries/my-metric.ts
184+
export function getMyMetric(path: string, startDate: string, endDate: string): string {
185+
return `
186+
PageViews
187+
| where Timestamp between (datetime(${startDate}) .. datetime(${endDate}))
188+
| where Path == "${path}"
189+
| summarize Count = count()
190+
`
191+
}
192+
```
193+
194+
### Troubleshooting
195+
196+
**Azure login expired:**
197+
```bash
198+
az login
199+
```
69200

70-
Applies to all scripts:
201+
**Missing environment variables:**
202+
Check `.env` file has `KUSTO_CLUSTER` and `KUSTO_DATABASE` (values in Slack)
71203

72-
* The date range option only accepts a start date (via `-r <number>`, where the number means "`<number>` days ago"). The end date will always be the current date.
73-
* In the future, we can add an option to set a custom end date.
204+
**No data found:**
205+
- Verify URL is correct and includes `https://docs.github.com`
206+
- Check date range (older content may have limited data)
207+
- Try `--redirects` if article was recently moved
74208

75-
* The only Kusto queries available are hardcoded in the `kusto/queries` directory.
76-
* In the future, we can hardcode more queries, add the ability to send custom queries, or perhaps create pre-defined sets of queries.
209+
**Permission errors:**
210+
Ensure your Azure account has read access to the Kusto database. Contact #docs-data if needed.

0 commit comments

Comments
 (0)