Skip to content

Surface partial coverage in byte-safe PR diff fetch#97

Merged
gontzess merged 3 commits into
mainfrom
gontzess/pr-review-decode-fix
Jun 15, 2026
Merged

Surface partial coverage in byte-safe PR diff fetch#97
gontzess merged 3 commits into
mainfrom
gontzess/pr-review-decode-fix

Conversation

@gontzess

@gontzess gontzess commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Why

The PR review context fetcher previously strict-decoded GitHub compare diffs, so a non-UTF-8 byte in a large vendor refresh could crash the required review job before the reviewer ran. Large compare diffs also need bounded review context, but filtering must not silently hide build-relevant paths from the reviewer.

What this changes

This keeps the byte-safe diff fetch, filters bulky vendored/generated/lockfile sections from the incremental artifact, and records partial-coverage metadata in pr-context.json. The incremental artifact now includes a visible notice when sections were dropped or truncated, go.mod and go.sum stay reviewable, and marker-only truncation falls back to full review mode.

The path parser now considers both sides of rename/copy sections so files moving between excluded and normal paths are retained for review. The review prompt also tells the reviewer to account for partial coverage before giving a no-blocking-issues verdict.

Validation: python3 -m unittest discover -s .github/actions/pr-review/scripts -p 'test_*.py'

gontzess and others added 2 commits June 12, 2026 20:13
… cap

The incremental compare diff was read under subprocess text=True, which
strict-UTF-8-decodes the entire gh api response. On large vendor-refresh
PRs the diff inlines a non-UTF-8 byte (git misclassifies a NUL-free
encrypted vendored file as text), so the decode raises UnicodeDecodeError
and the whole pr-review job dies before the review step. The diff is also
pathologically large in those cases (100s of MB), so even a clean decode
would hand an unusable artifact to the reviewer.

Read the diff as raw bytes via a new gh_api_bytes() helper, split it into
per-file diff --git sections, drop vendored and known generated/lockfile
sections, cap the retained diff at 20 MB with an explicit truncation
marker, and decode the remainder with errors="backslashreplace" (lossless
and never raises). When nothing reviewable remains, return None so the
existing full-review fallback engages.
@gontzess gontzess changed the title Harden pr-review diff fetch: byte-safe decode, vendor exclusion, size cap Surface partial coverage in byte-safe PR diff fetch Jun 13, 2026
@gontzess gontzess marked this pull request as ready for review June 15, 2026 13:28
@gontzess gontzess requested a review from a team June 15, 2026 13:28
@gontzess gontzess merged commit d01e228 into main Jun 15, 2026
3 checks passed
@gontzess gontzess deleted the gontzess/pr-review-decode-fix branch June 15, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants