Add snippet support to fossapi#30
Conversation
- no-inline-comments: its regex `^[ \t]*//` also matched `///` doc comments and `//!` module docs (idiomatic and used throughout, required by rustdoc). Rust's regex crate has no look-around, so rewrite lookaround-free as `^[ \t]*//([^/!]|$)` to flag only true inline `//` comments. - missing-error-context: suggested anyhow's `.context()?`, but this crate uses a custom thiserror `FossaError`/`Result` with no `.context()` method; plain `?` propagation is the established convention. Disabled. Validated with `nudge validate`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a Snippet entity so teams remediating snippet-scan findings can map third-party code matches to exact first-party file:line locations without leaving the terminal. Modeled as a child of Revision (locator in path, like Dependency). New read-only surface across library, CLI, and MCP: - get_snippets / get_snippet_paths / get_snippet_details / get_snippet_match / get_snippet_locations convenience functions - CLI: `list snippets|snippet-locations|snippet-paths`, `get snippet|snippet-match` (--with-lines resolves first-party line ranges; the drill-in renders a side-by-side of detected vs reference code with line numbers) - MCP: `snippet` entity on `list` (+ optional path/with_lines) and a dedicated `snippet_match` tool Structs were built from responses captured live against a real revision (tests/fixtures/snippets, see SCHEMA.md). Notable API quirks handled: snippet id is a string; the list endpoint caps pageSize at 50 (list_all overridden so pagination is correct); matchDetails.matchPercentage is 0-100 while other percentages are 0-1; and whole-file matches mark a trailing blank EOF line as highlighted (excluded from the reported range to avoid an off-by-one). Reject/unreject and cross-revision compare endpoints are out of scope for v1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WalkthroughThis PR extends the fossapi library with full support for snippet detection and remediation workflows. The core change introduces a complete 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/mcp/server.rs (1)
166-244:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEnforce snippet pagination semantics instead of silently ignoring
page/count.Line 167 and Line 168 derive pagination, but Line 226-244 ignores it and fetches full snippet locations. This makes
list(entity: "snippet")unbounded (especially withwith_lines=true) and breaks the list tool’s pagination contract.💡 Minimal fail-fast mitigation (until true snippet pagination is implemented)
EntityType::Snippet => { + if params.page.is_some() || params.count.is_some() { + return Err(McpError::invalid_params( + "Snippet listing does not currently support page/count; use path to narrow scope.", + None, + )); + } let parent = params.parent.ok_or_else(|| { McpError::invalid_params( "parent is required for listing snippets (revision locator)", None, ) })?;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/mcp/server.rs` around lines 166 - 244, In handle_list, the EntityType::Snippet branch currently ignores pagination; add a fail-fast check that returns an McpError::invalid_params when pagination or unbounded requests are attempted: if params.page.is_some() || params.count.is_some() || params.with_lines.unwrap_or(false) then return McpError::invalid_params("snippet listing does not support pagination or with_lines=true yet", None). Keep the existing get_snippet_locations call only for the safe default case (no explicit page/count and with_lines=false) so we don't return an unbounded result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/models/snippet.rs`:
- Around line 817-853: Refactor the serial loop in get_snippet_locations to use
a bounded concurrent fan-out: create async tasks per snippet (calling
get_snippet_details) and for each task, concurrently process its details.matches
(calling get_snippet_match only when with_lines) using
futures::stream::iter(...).map(...).buffer_unordered(CONCURRENCY) (or
FuturesUnordered) with a configurable CONCURRENCY constant to limit parallelism;
collect results into SnippetLocation structs (preserving error handling/logging
for get_snippet_match and detected_line_range) and push them into the final
locations Vec (use a thread-safe collector like Mutex<Vec<_>> or gather into a
Vec of Results then flatten) so behavior of get_snippet_details,
get_snippet_match, details.matches, detected_line_range and SnippetLocation is
preserved while avoiding unbounded concurrency.
- Around line 536-545: The highlighted_range function currently filters out all
blank highlighted lines, shifting the reported start when only trailing
highlighted blanks should be ignored; update highlighted_range to consider
highlighted CodeLine entries (using CodeLine.is_highlighted, .line,
.line_number) without dropping internal or leading blank highlighted lines, then
trim only trailing blank highlighted lines: find the last highlighted line whose
.line.trim() is non-empty (or return None if none), and compute the min/max
line_number across highlighted lines up to that last non-empty highlighted index
to produce the (lo, hi) range.
In `@tests/fixtures/snippets/snippet-details.json`:
- Around line 16-17: Replace the real email value in the JSON fixture for the
"rejectedBy" field with an anonymized placeholder (e.g. "redacted@example.com"
or "<REDACTED_EMAIL>") and apply the same replacement consistently for the other
fixture occurrence referenced (lines 35-36) so no direct identifiers remain in
snippet-details.json; update any tests/assertions that expect the original
literal to use the new placeholder.
In `@tests/snippet.rs`:
- Around line 50-55: The current mock for snippet listing (Mock::given +
method("GET") + path(format!("/revisions/{ENC_REV}/snippets")) ...) is too
permissive; change the list mock to assert the exact contract path including the
trailing slash (e.g. path(format!("/revisions/{ENC_REV}/snippets/"))) and keep
expect(1) for the single list call; additionally add separate mocks for each
snippet detail endpoint (Mock::given + method("GET") +
path(format!("/revisions/{ENC_REV}/snippets/{SNIPPET_ID}")) responding with
ResponseTemplate::new(200).set_body_json(...)) and set expect to the exact
number of detail calls your code should make (based on SNIPPETS_JSON), so both
list and per-snippet fanout counts are validated.
---
Outside diff comments:
In `@src/mcp/server.rs`:
- Around line 166-244: In handle_list, the EntityType::Snippet branch currently
ignores pagination; add a fail-fast check that returns an
McpError::invalid_params when pagination or unbounded requests are attempted: if
params.page.is_some() || params.count.is_some() ||
params.with_lines.unwrap_or(false) then return McpError::invalid_params("snippet
listing does not support pagination or with_lines=true yet", None). Keep the
existing get_snippet_locations call only for the safe default case (no explicit
page/count and with_lines=false) so we don't return an unbounded result.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c8cb5e03-77e8-46c4-8eeb-586e92e92a4b
📒 Files selected for processing (17)
.nudge.yamlsrc/bin/fossapi.rssrc/cli/mod.rssrc/lib.rssrc/mcp/params.rssrc/mcp/server.rssrc/models/mod.rssrc/models/snippet.rssrc/output.rstests/fixtures/snippets/SCHEMA.mdtests/fixtures/snippets/match-details-partial.jsontests/fixtures/snippets/match-details-small.jsontests/fixtures/snippets/paths-sources.jsontests/fixtures/snippets/paths.jsontests/fixtures/snippets/snippet-details.jsontests/fixtures/snippets/snippets.jsontests/snippet.rs
| fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> { | ||
| let nums = lines | ||
| .iter() | ||
| .filter(|l| l.is_highlighted && !l.line.trim().is_empty()) | ||
| .map(|l| l.line_number) | ||
| .collect::<Vec<_>>(); | ||
| match (nums.iter().min(), nums.iter().max()) { | ||
| (Some(&lo), Some(&hi)) => Some((lo, hi)), | ||
| _ => None, | ||
| } |
There was a problem hiding this comment.
highlighted_range currently removes non-trailing blank highlighted lines, which can misreport start lines.
The filter l.is_highlighted && !l.line.trim().is_empty() drops all blank highlighted lines. That breaks the “ignore trailing blank EOF lines” behavior and can shift the reported start line forward.
Suggested fix
fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> {
- let nums = lines
+ let highlighted = lines
.iter()
- .filter(|l| l.is_highlighted && !l.line.trim().is_empty())
- .map(|l| l.line_number)
+ .filter(|l| l.is_highlighted)
.collect::<Vec<_>>();
- match (nums.iter().min(), nums.iter().max()) {
- (Some(&lo), Some(&hi)) => Some((lo, hi)),
- _ => None,
- }
+ let first = highlighted.first().map(|l| l.line_number)?;
+ let last = highlighted
+ .iter()
+ .rev()
+ .find(|l| !l.line.trim().is_empty())
+ .map(|l| l.line_number)?;
+ Some((first, last))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> { | |
| let nums = lines | |
| .iter() | |
| .filter(|l| l.is_highlighted && !l.line.trim().is_empty()) | |
| .map(|l| l.line_number) | |
| .collect::<Vec<_>>(); | |
| match (nums.iter().min(), nums.iter().max()) { | |
| (Some(&lo), Some(&hi)) => Some((lo, hi)), | |
| _ => None, | |
| } | |
| fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> { | |
| let highlighted = lines | |
| .iter() | |
| .filter(|l| l.is_highlighted) | |
| .collect::<Vec<_>>(); | |
| let first = highlighted.first().map(|l| l.line_number)?; | |
| let last = highlighted | |
| .iter() | |
| .rev() | |
| .find(|l| !l.line.trim().is_empty()) | |
| .map(|l| l.line_number)?; | |
| Some((first, last)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/models/snippet.rs` around lines 536 - 545, The highlighted_range function
currently filters out all blank highlighted lines, shifting the reported start
when only trailing highlighted blanks should be ignored; update
highlighted_range to consider highlighted CodeLine entries (using
CodeLine.is_highlighted, .line, .line_number) without dropping internal or
leading blank highlighted lines, then trim only trailing blank highlighted
lines: find the last highlighted line whose .line.trim() is non-empty (or return
None if none), and compute the min/max line_number across highlighted lines up
to that last non-empty highlighted index to produce the (lo, hi) range.
| let mut locations = Vec::new(); | ||
| for snippet in &snippets { | ||
| let details = get_snippet_details(client, revision_locator, &snippet.id).await?; | ||
| let licenses = details.license_ids(); | ||
| for m in &details.matches { | ||
| let (line_start, line_end) = if with_lines { | ||
| match get_snippet_match(client, revision_locator, &snippet.id, &m.path).await { | ||
| Ok(md) => match md.detected_line_range() { | ||
| Some((lo, hi)) => (Some(lo), Some(hi)), | ||
| None => (None, None), | ||
| }, | ||
| Err(e) => { | ||
| tracing::warn!( | ||
| "Failed to fetch match details for {} at {}: {e}", | ||
| snippet.id, | ||
| m.path | ||
| ); | ||
| (None, None) | ||
| } | ||
| } | ||
| } else { | ||
| (None, None) | ||
| }; | ||
|
|
||
| locations.push(SnippetLocation { | ||
| path: m.path.clone(), | ||
| snippet_id: snippet.id.clone(), | ||
| package: details.package.clone(), | ||
| version: details.version.clone(), | ||
| purl: details.purl.clone(), | ||
| match_percentage: m.match_percentage, | ||
| licenses: licenses.clone(), | ||
| line_start, | ||
| line_end, | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 🏗️ Heavy lift
Consider bounded concurrency in get_snippet_locations to avoid high latency on large revisions.
This loop performs details and optional match drill-ins serially. For large N/M, wall-clock time scales with total request count and can become operationally expensive. A bounded concurrent fan-out (with backpressure) would materially improve throughput while controlling rate-limit pressure.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/models/snippet.rs` around lines 817 - 853, Refactor the serial loop in
get_snippet_locations to use a bounded concurrent fan-out: create async tasks
per snippet (calling get_snippet_details) and for each task, concurrently
process its details.matches (calling get_snippet_match only when with_lines)
using futures::stream::iter(...).map(...).buffer_unordered(CONCURRENCY) (or
FuturesUnordered) with a configurable CONCURRENCY constant to limit parallelism;
collect results into SnippetLocation structs (preserving error handling/logging
for get_snippet_match and detected_line_range) and push them into the final
locations Vec (use a thread-safe collector like Mutex<Vec<_>> or gather into a
Vec of Results then flatten) so behavior of get_snippet_details,
get_snippet_match, details.matches, detected_line_range and SnippetLocation is
preserved while avoiding unbounded concurrency.
| "rejectedBy": "sara@fossa.com" | ||
| } |
There was a problem hiding this comment.
Redact email identifiers in captured fixtures.
This fixture embeds a real-looking email address in rejectedBy. Storing direct identifiers in test artifacts creates unnecessary privacy/compliance exposure. Please replace with anonymized placeholders consistently across all snippet fixtures.
Also applies to: 35-36
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/fixtures/snippets/snippet-details.json` around lines 16 - 17, Replace
the real email value in the JSON fixture for the "rejectedBy" field with an
anonymized placeholder (e.g. "redacted@example.com" or "<REDACTED_EMAIL>") and
apply the same replacement consistently for the other fixture occurrence
referenced (lines 35-36) so no direct identifiers remain in
snippet-details.json; update any tests/assertions that expect the original
literal to use the new placeholder.
| Mock::given(method("GET")) | ||
| .and(path(format!("/revisions/{ENC_REV}/snippets"))) | ||
| .respond_with(ResponseTemplate::new(200).set_body_json(body(SNIPPETS_JSON))) | ||
| .expect(1) | ||
| .mount(&mock_server) | ||
| .await; |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Tighten mock expectations to catch snippet-query and fanout regressions.
These mocks currently don’t assert key contract details (path=/ on snippet list, expected number of details calls), so regressions can still pass.
✅ Suggested test hardening
Mock::given(method("GET"))
.and(path(format!("/revisions/{ENC_REV}/snippets")))
+ .and(query_param("path", "/"))
.respond_with(ResponseTemplate::new(200).set_body_json(body(SNIPPETS_JSON)))
.expect(1)
.mount(&mock_server)
.await;
@@
Mock::given(method("GET"))
.and(path(format!("/revisions/{ENC_REV}/snippets")))
+ .and(query_param("path", "/"))
.respond_with(ResponseTemplate::new(200).set_body_json(body(SNIPPETS_JSON)))
+ .expect(1)
.mount(&mock_server)
.await;
@@
Mock::given(method("GET"))
.and(path_regex(r"/snippets/[0-9]+$"))
.respond_with(ResponseTemplate::new(200).set_body_json(body(DETAILS_JSON)))
+ .expect(3)
.mount(&mock_server)
.await;Also applies to: 116-127
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/snippet.rs` around lines 50 - 55, The current mock for snippet listing
(Mock::given + method("GET") + path(format!("/revisions/{ENC_REV}/snippets"))
...) is too permissive; change the list mock to assert the exact contract path
including the trailing slash (e.g.
path(format!("/revisions/{ENC_REV}/snippets/"))) and keep expect(1) for the
single list call; additionally add separate mocks for each snippet detail
endpoint (Mock::given + method("GET") +
path(format!("/revisions/{ENC_REV}/snippets/{SNIPPET_ID}")) responding with
ResponseTemplate::new(200).set_body_json(...)) and set expect to the exact
number of detail calls your code should make (based on SNIPPETS_JSON), so both
list and per-snippet fanout counts are validated.
) Snippets shipped to main in #30 but the docs, version, and a broken CI config were never updated. This gets main release-ready for v0.2.0. Docs: - README: Snippets CLI section (list snippets/snippet-paths/ snippet-locations, get snippet/snippet-match), MCP tools table (snippet on `list`, the `snippet_match` tool, no get/update), and the Snippet locator - CLAUDE.md: ERD, architecture tree, the four snippet endpoints, the Snippet model entry with its API quirks, and Future Work Version: 0.1.0 -> 0.2.0 (Cargo.toml + Cargo.lock) CI runner fix: dist 0.27 defaults to ubuntu-20.04 and windows-2019, both retired by GitHub, so every Release-workflow job queued until the 24h timeout and the PR `plan` check sat "pending" indefinitely. Pin supported images via [workspace.metadata.dist.github-custom-runners] (ubuntu-22.04 / macos-13 / macos-14 / windows-2022) and regenerate release.yml. `dist generate --check` is clean; `dist plan` now assigns only hosted runners. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Adds a Snippet entity to fossapi so teams remediating snippet-scan findings can map third-party code matches to exact first-party
file:linelocations without leaving the terminal. Snippet scanning flags where (often AI-generated) first-party code matches open-source code — a license/compliance risk — and today the remediation loop forces a round-trip through the web app. This makes it scriptable.Modeled as a child of
Revision(locator in the URL path, same shape asDependency). v1 is read-only.Surface area
Library —
get_snippets,get_snippet_paths,get_snippet_details,get_snippet_match, and the aggregatingget_snippet_locations.CLI
fossapi list snippet-paths <rev>— hierarchical tree of files with snippet hits (--pathto drill)fossapi list snippet-locations <rev>— flat map: first-party file ↔ matched package/license/%, each row carryingsnippet+path;--with-linesresolves the line range per matchfossapi get snippet-match <rev> <snippet> <path>— side-by-side of your code (with line numbers + highlight gutter) vs the open-source referencelist snippets/get snippetMCP —
snippetentity onlist(optionalpath/with_lines) and a dedicatedsnippet_matchtool.API quirks handled (captured live; see
tests/fixtures/snippets/SCHEMA.md)idis a string, not a numberpageSizeat 50 →List::list_alloverridden so pagination is correctmatchDetails.matchPercentageis 0–100 while every other percentage is 0–1Reject/unreject and cross-revision
compared*endpoints are intentionally out of scope for v1.Test plan
cargo test— 103 lib unit tests + 5 snippet integration tests (wiremock against the captured fixtures) + existing suites, all green;cargo clippyclean for the new code.hack-house-docs, an Alamofire-based repo): every CLI command and both MCP tools run end-to-end. Cross-checked the results against a local checkout of that repo — matched files exist, line ranges equal the real file lengths, and the drill-in'sdetected_codeis byte-for-byte identical to the local file.🤖 Generated with Claude Code