Skip to content

Commit bab6e85

Browse files
heiskrCopilot
andauthored
Move llms.txt generation config to YAML (#59967)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 389ea2e commit bab6e85

3 files changed

Lines changed: 121 additions & 74 deletions

File tree

.github/workflows/sync-llms-txt-to-github.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@ name: Sync llms.txt to github/github
22

33
on:
44
workflow_dispatch:
5+
push:
6+
branches:
7+
- main
8+
paths:
9+
- 'data/llms-txt-config.yml'
10+
- 'src/workflows/generate-llms-txt.ts'
511
schedule:
612
- cron: '20 16 * * 1-5' # Weekdays at ~9:20am Pacific
713

data/llms-txt-config.yml

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Configuration for the llms.txt generation script
2+
# (src/workflows/generate-llms-txt.ts).
3+
#
4+
# Writers can edit this file to change what appears in the generated
5+
# llms.txt for github.com without modifying TypeScript code.
6+
#
7+
# Fields are ordered to match the output in the generated file.
8+
9+
# Header text at the top of llms.txt
10+
header: |
11+
# GitHub
12+
13+
> GitHub is a developer platform for building, shipping, and maintaining software. It provides cloud-based Git repository hosting, CI/CD via GitHub Actions, project management with Issues and Projects, code review via pull requests, AI-powered development with GitHub Copilot, and APIs (REST and GraphQL) for automation and integration.
14+
15+
GitHub documentation is available at https://docs.github.com. The content covers GitHub.com (cloud), GitHub Enterprise Server, and GitHub Enterprise Cloud.
16+
17+
# Programmatic access section
18+
api_section: |
19+
## Programmatic access (retrieve markdown via APIs instead of parsing HTML)
20+
21+
To retrieve full article content, page lists, or search results programmatically, please use the APIs below. These APIs return structured markdown and JSON and are the preferred way for LLMs and automated tools to access GitHub documentation.
22+
23+
- [Page List API](https://docs.github.com/api/pagelist/en/free-pro-team@latest): Returns every docs page path for a given language and version
24+
- [Article API](https://docs.github.com/api/article): Returns the full rendered content of any docs page as markdown. Example: `curl "https://docs.github.com/api/article?pathname=/en/get-started/start-your-journey/about-github-and-git"`
25+
- [Search API](https://docs.github.com/api/search): Search across all docs content. Example: `curl "https://docs.github.com/api/search?query=actions&language=en&version=free-pro-team@latest"`
26+
27+
# Section heading for pinned pages
28+
pinned_section_heading: |
29+
Building with GitHub (for coding agents and automation)
30+
31+
# Pinned pages for coding agents and automation — always included
32+
# regardless of popularity. Edit this list to add or remove pages.
33+
pinned_pages:
34+
- copilot/how-tos/provide-context/use-mcp/extend-copilot-chat-with-mcp
35+
- copilot/how-tos/provide-context/use-mcp/use-the-github-mcp-server
36+
- copilot/how-tos/provide-context/use-mcp/set-up-the-github-mcp-server
37+
- copilot/how-tos/use-copilot-agents/coding-agent/about-coding-agent
38+
- copilot/how-tos/use-copilot-agents/coding-agent/create-custom-agents
39+
- github-cli/github-cli
40+
- github-cli/github-cli/quickstart
41+
- rest
42+
- graphql
43+
- actions
44+
45+
# Number of top pages (by popularity) to include
46+
top_n: 100
47+
48+
# Categories with fewer pages than this are grouped under "More pages"
49+
small_category_threshold: 3
50+
51+
# Categories to exclude from llms.txt.
52+
# WARNING: early-access MUST stay excluded — never expose internal content.
53+
excluded_categories:
54+
- communities
55+
- contributing
56+
- early-access
57+
- education
58+
- enterprise-onboarding
59+
- integrations
60+
- nonprofit
61+
- search
62+
- site-policy
63+
- subscriptions-and-notifications
64+
- support
65+
- video-transcripts
66+
67+
# Slugs to exclude (e.g. index pages)
68+
excluded_slugs:
69+
- index
70+
71+
# Section heading for small categories grouped together
72+
more_pages_heading: |
73+
More pages
74+
75+
# Comment added at the end of the generated file
76+
auto_update_comment: |
77+
<!-- This file is automatically generated. Do not edit manually. -->

src/workflows/generate-llms-txt.ts

Lines changed: 38 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,14 @@
77
// Requires DOCS_BOT_PAT_BASE for fetching popularity data from
88
// github/docs-internal-data.
99
//
10+
// Configuration lives in data/llms-txt-config.yml — writers can edit
11+
// categories, pinned pages, thresholds, and text there without
12+
// touching this script.
13+
//
14+
15+
import fs from 'fs'
16+
17+
import yaml from 'js-yaml'
1018

1119
import { loadPageMap } from '@/frame/lib/page-data'
1220
import { renderContent } from '@/content-render/index'
@@ -15,87 +23,43 @@ import { allVersions } from '@/versions/lib/all-versions'
1523
import type { Page, Context } from '@/types'
1624

1725
// =====================================================================
18-
// Configuration — edit these to change what appears in llms.txt
26+
// Configuration — loaded from data/llms-txt-config.yml
1927
// =====================================================================
2028

21-
export const ROLLUP_URL =
22-
'https://raw.githubusercontent.com/github/docs-internal-data/main/hydro/rollups/pageviews/en/free-pro-team/rollup.json'
23-
export const TOP_N = 100
24-
export const SMALL_CATEGORY_THRESHOLD = 3
25-
const BASE_URL = 'https://docs.github.com'
26-
const BASE_API_URL = 'https://docs.github.com/api'
27-
28-
// early-access MUST stay excluded — never expose internal content.
29-
export const EXCLUDED_CATEGORIES = new Set([
30-
'communities',
31-
'contributing',
32-
'early-access',
33-
'education',
34-
'enterprise-onboarding',
35-
'integrations',
36-
'nonprofit',
37-
'search',
38-
'site-policy',
39-
'subscriptions-and-notifications',
40-
'support',
41-
'video-transcripts',
42-
])
43-
44-
export const EXCLUDED_SLUGS = new Set(['index'])
45-
46-
// Pinned pages for coding agents and automation — always included
47-
// regardless of popularity. Edit this list to add or remove pages.
48-
export const PINNED_PAGES = [
49-
'copilot/how-tos/provide-context/use-mcp/extend-copilot-chat-with-mcp',
50-
'copilot/how-tos/provide-context/use-mcp/use-the-github-mcp-server',
51-
'copilot/how-tos/provide-context/use-mcp/set-up-the-github-mcp-server',
52-
'copilot/how-tos/use-copilot-agents/coding-agent/about-coding-agent',
53-
'copilot/how-tos/use-copilot-agents/coding-agent/create-custom-agents',
54-
'github-cli/github-cli',
55-
'github-cli/github-cli/quickstart',
56-
'rest',
57-
'graphql',
58-
'actions',
59-
]
60-
61-
// =====================================================================
62-
// Header and API section text — easy to edit for writers
63-
// =====================================================================
64-
65-
const HEADER = `\
66-
# GitHub
67-
68-
> GitHub is a developer platform for building, shipping, and \
69-
maintaining software. It provides cloud-based Git repository hosting, \
70-
CI/CD via GitHub Actions, project management with Issues and Projects, \
71-
code review via pull requests, AI-powered development with GitHub \
72-
Copilot, and APIs (REST and GraphQL) for automation and integration.
73-
74-
GitHub documentation is available at ${BASE_URL}. The content covers \
75-
GitHub.com (cloud), GitHub Enterprise Server, and GitHub Enterprise Cloud.`
29+
interface LlmsTxtConfig {
30+
top_n: number
31+
small_category_threshold: number
32+
excluded_categories: string[]
33+
excluded_slugs: string[]
34+
pinned_pages: string[]
35+
pinned_section_heading: string
36+
more_pages_heading: string
37+
auto_update_comment: string
38+
header: string
39+
api_section: string
40+
}
7641

77-
const API_SECTION = `\
78-
## Programmatic access (retrieve markdown via APIs instead of parsing HTML)
42+
export function loadConfig(configPath = 'data/llms-txt-config.yml'): LlmsTxtConfig {
43+
return yaml.load(fs.readFileSync(configPath, 'utf8')) as LlmsTxtConfig
44+
}
7945

80-
To retrieve full article content, page lists, or search results \
81-
programmatically, please use the APIs below. These APIs return \
82-
structured markdown and JSON and are the preferred way for LLMs and \
83-
automated tools to access GitHub documentation.
46+
const config = loadConfig()
8447

85-
- [Page List API](${BASE_API_URL}/pagelist/en/free-pro-team@latest): \
86-
Returns every docs page path for a given language and version
87-
- [Article API](${BASE_API_URL}/article): \
88-
Returns the full rendered content of any docs page as markdown. \
89-
Example: \`curl "${BASE_API_URL}/article?pathname=/en/get-started/start-your-journey/about-github-and-git"\`
90-
- [Search API](${BASE_API_URL}/search): \
91-
Search across all docs content. \
92-
Example: \`curl "${BASE_API_URL}/search?query=actions&language=en&version=free-pro-team@latest"\``
48+
export const ROLLUP_URL =
49+
'https://raw.githubusercontent.com/github/docs-internal-data/main/hydro/rollups/pageviews/en/free-pro-team/rollup.json'
50+
export const TOP_N = config.top_n
51+
export const SMALL_CATEGORY_THRESHOLD = config.small_category_threshold
52+
export const EXCLUDED_CATEGORIES = new Set(config.excluded_categories)
53+
export const EXCLUDED_SLUGS = new Set(config.excluded_slugs)
54+
export const PINNED_PAGES = config.pinned_pages
9355

94-
const PINNED_SECTION_HEADING = 'Building with GitHub (for coding agents and automation)'
95-
const MORE_PAGES_HEADING = 'More pages'
56+
const BASE_URL = 'https://docs.github.com'
9657

97-
const AUTO_UPDATE_COMMENT = `\
98-
<!-- This file is automatically generated. Do not edit manually. -->`
58+
const HEADER = config.header.trimEnd()
59+
const API_SECTION = config.api_section.trimEnd()
60+
const PINNED_SECTION_HEADING = config.pinned_section_heading.trimEnd()
61+
const MORE_PAGES_HEADING = config.more_pages_heading.trimEnd()
62+
const AUTO_UPDATE_COMMENT = config.auto_update_comment.trimEnd()
9963

10064
// =====================================================================
10165
// Types

0 commit comments

Comments
 (0)