|
1 | 1 | # Shielding |
2 | 2 |
|
3 | | -## Overview |
| 3 | +The shielding subject protects docs.github.com from junk requests, abuse, and unnecessary server load. It implements various middleware to detect and handle suspicious traffic patterns, invalid requests, and rate limiting. |
4 | 4 |
|
5 | | -Essentially code in our server that controls the prevention of "junk requests" is scripted HTTP requests to endpoints that are _not_ made by regular browser users. |
| 5 | +## Purpose & Scope |
6 | 6 |
|
7 | | -For example, there's middleware code that sees if a `GET` request |
8 | | -comes in with a bunch of random looking query strings keys. This would cause a PASS on the CDN but would not actually matter to the rendering. In this |
9 | | -case, we spot this early and return a redirect response to the same URL |
10 | | -without the unrecognized query string keys so that if the request follows |
11 | | -redirects, the eventual 200 would be normalized by a common URL so the CDN |
12 | | -can serve a HIT. |
| 7 | +This subject is responsible for: |
| 8 | +- Detecting and handling invalid or suspicious requests |
| 9 | +- Rate limiting suspicious traffic patterns |
| 10 | +- Normalizing URLs to improve CDN cache hit rates |
| 11 | +- Preventing abuse from scripted/bot traffic |
| 12 | +- Redirecting malformed requests |
| 13 | +- Protecting backend servers from unnecessary work |
13 | 14 |
|
14 | | -Here's an in-time discussion post that summaries the _need_ and much of the |
15 | | -recent things we've done to fortify our backend servers to avoid unnecessary |
16 | | -work loads: |
| 15 | +Shielding code controls the prevention of "junk requests" - scripted HTTP requests that are not made by regular browser users. |
17 | 16 |
|
18 | | -**[How we have fortified Docs for better resiliency and availability (June 2023)](https://github.com/github/docs-engineering/discussions/3262)** |
| 17 | +## Architecture & Key Assets |
19 | 18 |
|
20 | | -## How it works |
| 19 | +### Key capabilities and their locations |
21 | 20 |
|
22 | | -At its root, the `src/shielding/frame/middleware/index.ts` is injected into our |
23 | | -Express server. From there, it loads all its individual middleware handlers. |
| 21 | +- `middleware/index.ts` - Main entry point that orchestrates all shielding middleware and rate limiting |
| 22 | +- Individual middleware files - Each focuses on a single abuse pattern identified from log analysis |
| 23 | +- Rate limiting logic - Uses `createRateLimiter()` for suspicious and API routes |
24 | 24 |
|
25 | | -Each middleware is one file that focuses on a single use-case. The |
26 | | -use-cases are borne from studying log files to |
27 | | -spot patterns of request abuse. |
| 25 | +## Setup & Usage |
28 | 26 |
|
29 | | -> [!NOTE] |
30 | | -> Some shielding "tricks" appear in other places throughout the code |
31 | | -> base such as controlling the 404 response for `/assets/*` URLs. |
| 27 | +### How it works |
32 | 28 |
|
33 | | -## Rate limiting |
| 29 | +1. `src/shielding/middleware/index.ts` is injected into the Express server |
| 30 | +2. Loads all individual middleware handlers |
| 31 | +3. Each middleware focuses on a single use-case/abuse pattern |
| 32 | +4. Abuse patterns discovered by studying log files |
34 | 33 |
|
35 | | -We rate limit at multiple levels: |
| 34 | +### Rate limiting |
| 35 | + |
| 36 | +Three levels of rate limiting: |
| 37 | + |
| 38 | +1. **CDN (Fastly)** - First line of defense |
| 39 | +2. **Suspicious routes** - Via shielding middleware |
| 40 | + - Only rate limited if deemed suspicious based on checked parameters |
| 41 | + - Implemented in `middleware/index.ts` with `createRateLimiter()` |
| 42 | +3. **API routes** - Via API declaration |
| 43 | + - Limited to certain # of requests per minute, regardless of request characteristics |
| 44 | + - Implemented in `src/frame/middleware/api.ts` |
| 45 | + |
| 46 | +### Common shielding patterns |
| 47 | + |
| 48 | +**Invalid query strings:** |
| 49 | +- Request: `GET /path?random=abc&weird=xyz` |
| 50 | +- Action: Redirect to `/path` (normalized URL) |
| 51 | +- Benefit: CDN can serve cached response for normalized URL |
| 52 | + |
| 53 | +**Malformed URLs:** |
| 54 | +- Invalid characters or patterns in URL |
| 55 | +- Action: Return 400 or redirect to corrected URL |
| 56 | +- Benefit: Prevent errors propagating to application code |
| 57 | + |
| 58 | +**Invalid paths:** |
| 59 | +- Suspicious path patterns (probes, exploits) |
| 60 | +- Action: Reject with appropriate status code |
| 61 | +- Benefit: Prevent unnecessary processing |
| 62 | + |
| 63 | +### Running tests |
| 64 | + |
| 65 | +```bash |
| 66 | +npm run test -- src/shielding/tests |
| 67 | +``` |
| 68 | + |
| 69 | +## Data & External Dependencies |
| 70 | + |
| 71 | +### Data inputs |
| 72 | +- HTTP request metadata (path, query strings, headers) |
| 73 | +- Known good/bad patterns from log analysis |
| 74 | +- CDN cache behavior data |
| 75 | + |
| 76 | +### Dependencies |
| 77 | +- Express middleware |
| 78 | +- Rate limiting library (likely `express-rate-limit` or similar) |
| 79 | +- `@/frame` - Express server integration |
| 80 | +- CDN configuration (Fastly) |
| 81 | + |
| 82 | +### Data outputs |
| 83 | +- HTTP responses (redirects, 400s, 429s for rate limit) |
| 84 | +- Cache-friendly normalized URLs |
| 85 | +- Reduced backend server load |
| 86 | + |
| 87 | +## Cross-links & Ownership |
| 88 | + |
| 89 | +### Related subjects |
| 90 | +- [`src/frame`](../frame/README.md) - Express middleware pipeline integration |
| 91 | +- [`src/observability`](../observability/README.md) - Logging suspicious traffic patterns |
| 92 | +- CDN configuration - Fastly edge rules |
| 93 | + |
| 94 | +### Internal documentation |
| 95 | +For detailed discussion on resilience and availability improvements, see: |
| 96 | +- [How we have fortified Docs for better resiliency and availability (June 2023)](https://github.com/github/docs-engineering/discussions/3262) |
| 97 | + |
| 98 | +### Ownership |
| 99 | +- Team: Docs Engineering |
| 100 | + |
| 101 | +## Current State & Next Steps |
| 102 | + |
| 103 | +### Shielding strategies |
| 104 | + |
| 105 | +Each middleware implements a specific strategy based on observed abuse: |
| 106 | +- Query string normalization for CDN optimization |
| 107 | +- Path validation to reject probes/exploits |
| 108 | +- Header validation to detect bot traffic |
| 109 | +- Next.js path handling for framework-specific patterns |
| 110 | + |
| 111 | +### Known limitations |
| 112 | +- Shielding is reactive (based on observing abuse patterns) |
| 113 | +- Some legitimate traffic may be affected if patterns overlap with abuse |
| 114 | +- Rate limits are tuned based on historical data |
| 115 | +- Some shielding logic exists outside this subject (e.g., `/assets/*` 404 handling) |
| 116 | + |
| 117 | +### Adding new shielding middleware |
| 118 | + |
| 119 | +1. Identify abuse pattern from logs |
| 120 | +2. Create new middleware file in `src/shielding/middleware/` |
| 121 | +3. Implement detection and handling logic |
| 122 | +4. Add to orchestrator in `index.ts` |
| 123 | +5. Add tests in `tests/` |
| 124 | +6. Monitor impact on CDN cache hit rate and server load |
| 125 | + |
| 126 | +### Monitoring shielding effectiveness |
| 127 | + |
| 128 | +Key metrics: |
| 129 | +- CDN cache hit rate (should increase) |
| 130 | +- Backend server load (should decrease) |
| 131 | +- 4xx/5xx error rates (monitor for false positives) |
| 132 | +- Rate limit triggers (logged in observability) |
| 133 | + |
| 134 | +Check #docs-ops and monitoring dashboards for ongoing effectiveness. |
| 135 | + |
| 136 | +### Configuration |
| 137 | + |
| 138 | +Rate limit configuration: |
| 139 | +- Thresholds tuned based on traffic patterns |
| 140 | +- Different limits for different route types |
| 141 | +- Suspicious request detection parameters |
| 142 | + |
| 143 | +CDN integration: |
| 144 | +- Works with Fastly configuration |
| 145 | +- Ensures normalized URLs maximize cache hits |
| 146 | +- Some shielding happens at CDN edge |
| 147 | +- Dashboard for real-time shielding metrics |
| 148 | + |
| 149 | +### Troubleshooting |
| 150 | + |
| 151 | +**Legitimate traffic blocked:** |
| 152 | +- Check shielding logs in Splunk |
| 153 | +- Identify which middleware triggered |
| 154 | +- Adjust pattern matching or rate limits |
| 155 | +- Consider allowlist for specific use cases |
| 156 | + |
| 157 | +**Abuse still getting through:** |
| 158 | +- Analyze logs for new patterns |
| 159 | +- Add new middleware to handle pattern |
| 160 | +- Adjust existing middleware thresholds |
| 161 | +- Consider CDN-level blocking |
| 162 | + |
| 163 | +**CDN cache hit rate not improving:** |
| 164 | +- Verify URL normalization is working |
| 165 | +- Check that redirects are followed |
| 166 | +- Analyze cache miss patterns |
| 167 | +- Coordinate with CDN configuration |
36 | 168 |
|
37 | | -1. CDN (Fastly) |
38 | | -2. All routes via [src/shielding/frame/index.ts](./middleware/index.ts) and the `createRateLimiter()` middleware. |
39 | | - - These routes are _only_ rate limited if they are deemed suspicious based on parameters we check. |
40 | | -3. API routes via their declaration in [src/frame/middleware/api.ts](../frame/middleware/api.ts) using the `createRateLimiter()` middleware. |
41 | | - - These routes are limited to a certain # of requests per minute, regardless of what the request looks like. |
|
0 commit comments