Skip to content

Add snippet support to fossapi#30

Merged
saramaebee merged 2 commits into
mainfrom
feature/snippets
Jun 12, 2026
Merged

Add snippet support to fossapi#30
saramaebee merged 2 commits into
mainfrom
feature/snippets

Conversation

@saramaebee

Copy link
Copy Markdown
Collaborator

Summary

Adds a Snippet entity to fossapi so teams remediating snippet-scan findings can map third-party code matches to exact first-party file:line locations without leaving the terminal. Snippet scanning flags where (often AI-generated) first-party code matches open-source code — a license/compliance risk — and today the remediation loop forces a round-trip through the web app. This makes it scriptable.

Modeled as a child of Revision (locator in the URL path, same shape as Dependency). v1 is read-only.

Surface area

Libraryget_snippets, get_snippet_paths, get_snippet_details, get_snippet_match, and the aggregating get_snippet_locations.

CLI

  • fossapi list snippet-paths <rev> — hierarchical tree of files with snippet hits (--path to drill)
  • fossapi list snippet-locations <rev> — flat map: first-party file ↔ matched package/license/%, each row carrying snippet+path; --with-lines resolves the line range per match
  • fossapi get snippet-match <rev> <snippet> <path> — side-by-side of your code (with line numbers + highlight gutter) vs the open-source reference
  • plus list snippets / get snippet

MCPsnippet entity on list (optional path / with_lines) and a dedicated snippet_match tool.

API quirks handled (captured live; see tests/fixtures/snippets/SCHEMA.md)

  • snippet id is a string, not a number
  • the list endpoint caps pageSize at 50List::list_all overridden so pagination is correct
  • matchDetails.matchPercentage is 0–100 while every other percentage is 0–1
  • whole-file matches mark a trailing blank EOF line as highlighted → excluded from the reported range to avoid an off-by-one

Reject/unreject and cross-revision compared* endpoints are intentionally out of scope for v1.

The first commit (.nudge.yaml rule fixes) is independent of the feature and reviewable on its own — a regex bug that flagged /// doc comments, and an anyhow-specific error rule this thiserror crate can't satisfy.

Test plan

  • cargo test103 lib unit tests + 5 snippet integration tests (wiremock against the captured fixtures) + existing suites, all green; cargo clippy clean for the new code.
  • Verified live against a real revision (hack-house-docs, an Alamofire-based repo): every CLI command and both MCP tools run end-to-end. Cross-checked the results against a local checkout of that repo — matched files exist, line ranges equal the real file lengths, and the drill-in's detected_code is byte-for-byte identical to the local file.

🤖 Generated with Claude Code

saramaebee and others added 2 commits June 11, 2026 14:42
- no-inline-comments: its regex `^[ \t]*//` also matched `///` doc comments
  and `//!` module docs (idiomatic and used throughout, required by rustdoc).
  Rust's regex crate has no look-around, so rewrite lookaround-free as
  `^[ \t]*//([^/!]|$)` to flag only true inline `//` comments.
- missing-error-context: suggested anyhow's `.context()?`, but this crate uses
  a custom thiserror `FossaError`/`Result` with no `.context()` method; plain
  `?` propagation is the established convention. Disabled.

Validated with `nudge validate`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a Snippet entity so teams remediating snippet-scan findings can map
third-party code matches to exact first-party file:line locations without
leaving the terminal.

Modeled as a child of Revision (locator in path, like Dependency). New
read-only surface across library, CLI, and MCP:

- get_snippets / get_snippet_paths / get_snippet_details / get_snippet_match /
  get_snippet_locations convenience functions
- CLI: `list snippets|snippet-locations|snippet-paths`, `get snippet|snippet-match`
  (--with-lines resolves first-party line ranges; the drill-in renders a
  side-by-side of detected vs reference code with line numbers)
- MCP: `snippet` entity on `list` (+ optional path/with_lines) and a dedicated
  `snippet_match` tool

Structs were built from responses captured live against a real revision
(tests/fixtures/snippets, see SCHEMA.md). Notable API quirks handled: snippet
id is a string; the list endpoint caps pageSize at 50 (list_all overridden so
pagination is correct); matchDetails.matchPercentage is 0-100 while other
percentages are 0-1; and whole-file matches mark a trailing blank EOF line as
highlighted (excluded from the reported range to avoid an off-by-one).

Reject/unreject and cross-revision compare endpoints are out of scope for v1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This PR extends the fossapi library with full support for snippet detection and remediation workflows. The core change introduces a complete Snippet domain model with API types (Snippet, SnippetMatch, SnippetMatchDetails, SnippetLocation), implements the List trait for paginated querying, and provides convenience functions to list snippets, retrieve details, drill into matches, and aggregate flattened match locations. The library re-exports these new types and functions for public use. CLI commands are extended with get snippet, get snippet-match, list snippets, list snippet-locations, and list snippet-paths subcommands that construct queries, call the convenience functions, and render results as JSON or formatted tables. An MCP snippet_match tool and expanded list tool support snippet operations via structured parameters. Output formatting adds PrettyPrint implementations for Snippet and SnippetMatchDetails with code block rendering. Linting rules are refined to reduce false positives, and comprehensive fixtures and integration tests validate the domain model and API wiring.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add snippet support to fossapi' clearly and concisely describes the main change: adding a new Snippet entity to the fossapi library.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the purpose, API surface (library, CLI, MCP), API quirks handled, scope limitations, and test coverage.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/mcp/server.rs (1)

166-244: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce snippet pagination semantics instead of silently ignoring page/count.

Line 167 and Line 168 derive pagination, but Line 226-244 ignores it and fetches full snippet locations. This makes list(entity: "snippet") unbounded (especially with with_lines=true) and breaks the list tool’s pagination contract.

💡 Minimal fail-fast mitigation (until true snippet pagination is implemented)
             EntityType::Snippet => {
+                if params.page.is_some() || params.count.is_some() {
+                    return Err(McpError::invalid_params(
+                        "Snippet listing does not currently support page/count; use path to narrow scope.",
+                        None,
+                    ));
+                }
                 let parent = params.parent.ok_or_else(|| {
                     McpError::invalid_params(
                         "parent is required for listing snippets (revision locator)",
                         None,
                     )
                 })?;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mcp/server.rs` around lines 166 - 244, In handle_list, the
EntityType::Snippet branch currently ignores pagination; add a fail-fast check
that returns an McpError::invalid_params when pagination or unbounded requests
are attempted: if params.page.is_some() || params.count.is_some() ||
params.with_lines.unwrap_or(false) then return McpError::invalid_params("snippet
listing does not support pagination or with_lines=true yet", None). Keep the
existing get_snippet_locations call only for the safe default case (no explicit
page/count and with_lines=false) so we don't return an unbounded result.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/models/snippet.rs`:
- Around line 817-853: Refactor the serial loop in get_snippet_locations to use
a bounded concurrent fan-out: create async tasks per snippet (calling
get_snippet_details) and for each task, concurrently process its details.matches
(calling get_snippet_match only when with_lines) using
futures::stream::iter(...).map(...).buffer_unordered(CONCURRENCY) (or
FuturesUnordered) with a configurable CONCURRENCY constant to limit parallelism;
collect results into SnippetLocation structs (preserving error handling/logging
for get_snippet_match and detected_line_range) and push them into the final
locations Vec (use a thread-safe collector like Mutex<Vec<_>> or gather into a
Vec of Results then flatten) so behavior of get_snippet_details,
get_snippet_match, details.matches, detected_line_range and SnippetLocation is
preserved while avoiding unbounded concurrency.
- Around line 536-545: The highlighted_range function currently filters out all
blank highlighted lines, shifting the reported start when only trailing
highlighted blanks should be ignored; update highlighted_range to consider
highlighted CodeLine entries (using CodeLine.is_highlighted, .line,
.line_number) without dropping internal or leading blank highlighted lines, then
trim only trailing blank highlighted lines: find the last highlighted line whose
.line.trim() is non-empty (or return None if none), and compute the min/max
line_number across highlighted lines up to that last non-empty highlighted index
to produce the (lo, hi) range.

In `@tests/fixtures/snippets/snippet-details.json`:
- Around line 16-17: Replace the real email value in the JSON fixture for the
"rejectedBy" field with an anonymized placeholder (e.g. "redacted@example.com"
or "<REDACTED_EMAIL>") and apply the same replacement consistently for the other
fixture occurrence referenced (lines 35-36) so no direct identifiers remain in
snippet-details.json; update any tests/assertions that expect the original
literal to use the new placeholder.

In `@tests/snippet.rs`:
- Around line 50-55: The current mock for snippet listing (Mock::given +
method("GET") + path(format!("/revisions/{ENC_REV}/snippets")) ...) is too
permissive; change the list mock to assert the exact contract path including the
trailing slash (e.g. path(format!("/revisions/{ENC_REV}/snippets/"))) and keep
expect(1) for the single list call; additionally add separate mocks for each
snippet detail endpoint (Mock::given + method("GET") +
path(format!("/revisions/{ENC_REV}/snippets/{SNIPPET_ID}")) responding with
ResponseTemplate::new(200).set_body_json(...)) and set expect to the exact
number of detail calls your code should make (based on SNIPPETS_JSON), so both
list and per-snippet fanout counts are validated.

---

Outside diff comments:
In `@src/mcp/server.rs`:
- Around line 166-244: In handle_list, the EntityType::Snippet branch currently
ignores pagination; add a fail-fast check that returns an
McpError::invalid_params when pagination or unbounded requests are attempted: if
params.page.is_some() || params.count.is_some() ||
params.with_lines.unwrap_or(false) then return McpError::invalid_params("snippet
listing does not support pagination or with_lines=true yet", None). Keep the
existing get_snippet_locations call only for the safe default case (no explicit
page/count and with_lines=false) so we don't return an unbounded result.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c8cb5e03-77e8-46c4-8eeb-586e92e92a4b

📥 Commits

Reviewing files that changed from the base of the PR and between f9f3db3 and 244c720.

📒 Files selected for processing (17)
  • .nudge.yaml
  • src/bin/fossapi.rs
  • src/cli/mod.rs
  • src/lib.rs
  • src/mcp/params.rs
  • src/mcp/server.rs
  • src/models/mod.rs
  • src/models/snippet.rs
  • src/output.rs
  • tests/fixtures/snippets/SCHEMA.md
  • tests/fixtures/snippets/match-details-partial.json
  • tests/fixtures/snippets/match-details-small.json
  • tests/fixtures/snippets/paths-sources.json
  • tests/fixtures/snippets/paths.json
  • tests/fixtures/snippets/snippet-details.json
  • tests/fixtures/snippets/snippets.json
  • tests/snippet.rs

Comment thread src/models/snippet.rs
Comment on lines +536 to +545
fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> {
let nums = lines
.iter()
.filter(|l| l.is_highlighted && !l.line.trim().is_empty())
.map(|l| l.line_number)
.collect::<Vec<_>>();
match (nums.iter().min(), nums.iter().max()) {
(Some(&lo), Some(&hi)) => Some((lo, hi)),
_ => None,
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

highlighted_range currently removes non-trailing blank highlighted lines, which can misreport start lines.

The filter l.is_highlighted && !l.line.trim().is_empty() drops all blank highlighted lines. That breaks the “ignore trailing blank EOF lines” behavior and can shift the reported start line forward.

Suggested fix
     fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> {
-        let nums = lines
+        let highlighted = lines
             .iter()
-            .filter(|l| l.is_highlighted && !l.line.trim().is_empty())
-            .map(|l| l.line_number)
+            .filter(|l| l.is_highlighted)
             .collect::<Vec<_>>();
-        match (nums.iter().min(), nums.iter().max()) {
-            (Some(&lo), Some(&hi)) => Some((lo, hi)),
-            _ => None,
-        }
+        let first = highlighted.first().map(|l| l.line_number)?;
+        let last = highlighted
+            .iter()
+            .rev()
+            .find(|l| !l.line.trim().is_empty())
+            .map(|l| l.line_number)?;
+        Some((first, last))
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> {
let nums = lines
.iter()
.filter(|l| l.is_highlighted && !l.line.trim().is_empty())
.map(|l| l.line_number)
.collect::<Vec<_>>();
match (nums.iter().min(), nums.iter().max()) {
(Some(&lo), Some(&hi)) => Some((lo, hi)),
_ => None,
}
fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> {
let highlighted = lines
.iter()
.filter(|l| l.is_highlighted)
.collect::<Vec<_>>();
let first = highlighted.first().map(|l| l.line_number)?;
let last = highlighted
.iter()
.rev()
.find(|l| !l.line.trim().is_empty())
.map(|l| l.line_number)?;
Some((first, last))
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/snippet.rs` around lines 536 - 545, The highlighted_range function
currently filters out all blank highlighted lines, shifting the reported start
when only trailing highlighted blanks should be ignored; update
highlighted_range to consider highlighted CodeLine entries (using
CodeLine.is_highlighted, .line, .line_number) without dropping internal or
leading blank highlighted lines, then trim only trailing blank highlighted
lines: find the last highlighted line whose .line.trim() is non-empty (or return
None if none), and compute the min/max line_number across highlighted lines up
to that last non-empty highlighted index to produce the (lo, hi) range.

Comment thread src/models/snippet.rs
Comment on lines +817 to +853
let mut locations = Vec::new();
for snippet in &snippets {
let details = get_snippet_details(client, revision_locator, &snippet.id).await?;
let licenses = details.license_ids();
for m in &details.matches {
let (line_start, line_end) = if with_lines {
match get_snippet_match(client, revision_locator, &snippet.id, &m.path).await {
Ok(md) => match md.detected_line_range() {
Some((lo, hi)) => (Some(lo), Some(hi)),
None => (None, None),
},
Err(e) => {
tracing::warn!(
"Failed to fetch match details for {} at {}: {e}",
snippet.id,
m.path
);
(None, None)
}
}
} else {
(None, None)
};

locations.push(SnippetLocation {
path: m.path.clone(),
snippet_id: snippet.id.clone(),
package: details.package.clone(),
version: details.version.clone(),
purl: details.purl.clone(),
match_percentage: m.match_percentage,
licenses: licenses.clone(),
line_start,
line_end,
});
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 🏗️ Heavy lift

Consider bounded concurrency in get_snippet_locations to avoid high latency on large revisions.

This loop performs details and optional match drill-ins serially. For large N/M, wall-clock time scales with total request count and can become operationally expensive. A bounded concurrent fan-out (with backpressure) would materially improve throughput while controlling rate-limit pressure.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/snippet.rs` around lines 817 - 853, Refactor the serial loop in
get_snippet_locations to use a bounded concurrent fan-out: create async tasks
per snippet (calling get_snippet_details) and for each task, concurrently
process its details.matches (calling get_snippet_match only when with_lines)
using futures::stream::iter(...).map(...).buffer_unordered(CONCURRENCY) (or
FuturesUnordered) with a configurable CONCURRENCY constant to limit parallelism;
collect results into SnippetLocation structs (preserving error handling/logging
for get_snippet_match and detected_line_range) and push them into the final
locations Vec (use a thread-safe collector like Mutex<Vec<_>> or gather into a
Vec of Results then flatten) so behavior of get_snippet_details,
get_snippet_match, details.matches, detected_line_range and SnippetLocation is
preserved while avoiding unbounded concurrency.

Comment on lines +16 to +17
"rejectedBy": "sara@fossa.com"
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Redact email identifiers in captured fixtures.

This fixture embeds a real-looking email address in rejectedBy. Storing direct identifiers in test artifacts creates unnecessary privacy/compliance exposure. Please replace with anonymized placeholders consistently across all snippet fixtures.

Also applies to: 35-36

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/fixtures/snippets/snippet-details.json` around lines 16 - 17, Replace
the real email value in the JSON fixture for the "rejectedBy" field with an
anonymized placeholder (e.g. "redacted@example.com" or "<REDACTED_EMAIL>") and
apply the same replacement consistently for the other fixture occurrence
referenced (lines 35-36) so no direct identifiers remain in
snippet-details.json; update any tests/assertions that expect the original
literal to use the new placeholder.

Comment thread tests/snippet.rs
Comment on lines +50 to +55
Mock::given(method("GET"))
.and(path(format!("/revisions/{ENC_REV}/snippets")))
.respond_with(ResponseTemplate::new(200).set_body_json(body(SNIPPETS_JSON)))
.expect(1)
.mount(&mock_server)
.await;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Tighten mock expectations to catch snippet-query and fanout regressions.

These mocks currently don’t assert key contract details (path=/ on snippet list, expected number of details calls), so regressions can still pass.

✅ Suggested test hardening
     Mock::given(method("GET"))
         .and(path(format!("/revisions/{ENC_REV}/snippets")))
+        .and(query_param("path", "/"))
         .respond_with(ResponseTemplate::new(200).set_body_json(body(SNIPPETS_JSON)))
         .expect(1)
         .mount(&mock_server)
         .await;
@@
     Mock::given(method("GET"))
         .and(path(format!("/revisions/{ENC_REV}/snippets")))
+        .and(query_param("path", "/"))
         .respond_with(ResponseTemplate::new(200).set_body_json(body(SNIPPETS_JSON)))
+        .expect(1)
         .mount(&mock_server)
         .await;
@@
     Mock::given(method("GET"))
         .and(path_regex(r"/snippets/[0-9]+$"))
         .respond_with(ResponseTemplate::new(200).set_body_json(body(DETAILS_JSON)))
+        .expect(3)
         .mount(&mock_server)
         .await;

Also applies to: 116-127

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/snippet.rs` around lines 50 - 55, The current mock for snippet listing
(Mock::given + method("GET") + path(format!("/revisions/{ENC_REV}/snippets"))
...) is too permissive; change the list mock to assert the exact contract path
including the trailing slash (e.g.
path(format!("/revisions/{ENC_REV}/snippets/"))) and keep expect(1) for the
single list call; additionally add separate mocks for each snippet detail
endpoint (Mock::given + method("GET") +
path(format!("/revisions/{ENC_REV}/snippets/{SNIPPET_ID}")) responding with
ResponseTemplate::new(200).set_body_json(...)) and set expect to the exact
number of detail calls your code should make (based on SNIPPETS_JSON), so both
list and per-snippet fanout counts are validated.

@saramaebee saramaebee merged commit 724844a into main Jun 12, 2026
1 check failed
saramaebee added a commit that referenced this pull request Jun 12, 2026
)

Snippets shipped to main in #30 but the docs, version, and a broken CI
config were never updated. This gets main release-ready for v0.2.0.

Docs:
- README: Snippets CLI section (list snippets/snippet-paths/
  snippet-locations, get snippet/snippet-match), MCP tools table
  (snippet on `list`, the `snippet_match` tool, no get/update), and the
  Snippet locator
- CLAUDE.md: ERD, architecture tree, the four snippet endpoints, the
  Snippet model entry with its API quirks, and Future Work

Version: 0.1.0 -> 0.2.0 (Cargo.toml + Cargo.lock)

CI runner fix: dist 0.27 defaults to ubuntu-20.04 and windows-2019,
both retired by GitHub, so every Release-workflow job queued until the
24h timeout and the PR `plan` check sat "pending" indefinitely. Pin
supported images via [workspace.metadata.dist.github-custom-runners]
(ubuntu-22.04 / macos-13 / macos-14 / windows-2022) and regenerate
release.yml. `dist generate --check` is clean; `dist plan` now assigns
only hosted runners.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant