Add snippet support to fossapi by saramaebee · Pull Request #30 · fossas/fossapi

saramaebee · 2026-06-11T18:43:11Z

Summary

Adds a Snippet entity to fossapi so teams remediating snippet-scan findings can map third-party code matches to exact first-party file:line locations without leaving the terminal. Snippet scanning flags where (often AI-generated) first-party code matches open-source code — a license/compliance risk — and today the remediation loop forces a round-trip through the web app. This makes it scriptable.

Modeled as a child of Revision (locator in the URL path, same shape as Dependency). v1 is read-only.

Surface area

Library — get_snippets, get_snippet_paths, get_snippet_details, get_snippet_match, and the aggregating get_snippet_locations.

CLI

fossapi list snippet-paths <rev> — hierarchical tree of files with snippet hits (--path to drill)
fossapi list snippet-locations <rev> — flat map: first-party file ↔ matched package/license/%, each row carrying snippet+path; --with-lines resolves the line range per match
fossapi get snippet-match <rev> <snippet> <path> — side-by-side of your code (with line numbers + highlight gutter) vs the open-source reference
plus list snippets / get snippet

MCP — snippet entity on list (optional path / with_lines) and a dedicated snippet_match tool.

API quirks handled (captured live; see `tests/fixtures/snippets/SCHEMA.md`)

snippet id is a string, not a number
the list endpoint caps pageSize at 50 → List::list_all overridden so pagination is correct
matchDetails.matchPercentage is 0–100 while every other percentage is 0–1
whole-file matches mark a trailing blank EOF line as highlighted → excluded from the reported range to avoid an off-by-one

Reject/unreject and cross-revision compared* endpoints are intentionally out of scope for v1.

The first commit (.nudge.yaml rule fixes) is independent of the feature and reviewable on its own — a regex bug that flagged /// doc comments, and an anyhow-specific error rule this thiserror crate can't satisfy.

Test plan

cargo test — 103 lib unit tests + 5 snippet integration tests (wiremock against the captured fixtures) + existing suites, all green; cargo clippy clean for the new code.
Verified live against a real revision (hack-house-docs, an Alamofire-based repo): every CLI command and both MCP tools run end-to-end. Cross-checked the results against a local checkout of that repo — matched files exist, line ranges equal the real file lengths, and the drill-in's detected_code is byte-for-byte identical to the local file.

🤖 Generated with Claude Code

- no-inline-comments: its regex `^[ \t]*//` also matched `///` doc comments and `//!` module docs (idiomatic and used throughout, required by rustdoc). Rust's regex crate has no look-around, so rewrite lookaround-free as `^[ \t]*//([^/!]|$)` to flag only true inline `//` comments. - missing-error-context: suggested anyhow's `.context()?`, but this crate uses a custom thiserror `FossaError`/`Result` with no `.context()` method; plain `?` propagation is the established convention. Disabled. Validated with `nudge validate`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds a Snippet entity so teams remediating snippet-scan findings can map third-party code matches to exact first-party file:line locations without leaving the terminal. Modeled as a child of Revision (locator in path, like Dependency). New read-only surface across library, CLI, and MCP: - get_snippets / get_snippet_paths / get_snippet_details / get_snippet_match / get_snippet_locations convenience functions - CLI: `list snippets|snippet-locations|snippet-paths`, `get snippet|snippet-match` (--with-lines resolves first-party line ranges; the drill-in renders a side-by-side of detected vs reference code with line numbers) - MCP: `snippet` entity on `list` (+ optional path/with_lines) and a dedicated `snippet_match` tool Structs were built from responses captured live against a real revision (tests/fixtures/snippets, see SCHEMA.md). Notable API quirks handled: snippet id is a string; the list endpoint caps pageSize at 50 (list_all overridden so pagination is correct); matchDetails.matchPercentage is 0-100 while other percentages are 0-1; and whole-file matches mark a trailing blank EOF line as highlighted (excluded from the reported range to avoid an off-by-one). Reject/unreject and cross-revision compare endpoints are out of scope for v1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-11T18:50:33Z

Walkthrough

This PR extends the fossapi library with full support for snippet detection and remediation workflows. The core change introduces a complete Snippet domain model with API types (Snippet, SnippetMatch, SnippetMatchDetails, SnippetLocation), implements the List trait for paginated querying, and provides convenience functions to list snippets, retrieve details, drill into matches, and aggregate flattened match locations. The library re-exports these new types and functions for public use. CLI commands are extended with get snippet, get snippet-match, list snippets, list snippet-locations, and list snippet-paths subcommands that construct queries, call the convenience functions, and render results as JSON or formatted tables. An MCP snippet_match tool and expanded list tool support snippet operations via structured parameters. Output formatting adds PrettyPrint implementations for Snippet and SnippetMatchDetails with code block rendering. Linting rules are refined to reduce false positives, and comprehensive fixtures and integration tests validate the domain model and API wiring.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Add snippet support to fossapi' clearly and concisely describes the main change: adding a new Snippet entity to the fossapi library.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the purpose, API surface (library, CLI, MCP), API quirks handled, scope limitations, and test coverage.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/mcp/server.rs (1)

166-244: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce snippet pagination semantics instead of silently ignoring page/count.

Line 167 and Line 168 derive pagination, but Line 226-244 ignores it and fetches full snippet locations. This makes list(entity: "snippet") unbounded (especially with with_lines=true) and breaks the list tool’s pagination contract.

💡 Minimal fail-fast mitigation (until true snippet pagination is implemented)

             EntityType::Snippet => {
+                if params.page.is_some() || params.count.is_some() {
+                    return Err(McpError::invalid_params(
+                        "Snippet listing does not currently support page/count; use path to narrow scope.",
+                        None,
+                    ));
+                }
                 let parent = params.parent.ok_or_else(|| {
                     McpError::invalid_params(
                         "parent is required for listing snippets (revision locator)",
                         None,
                     )
                 })?;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mcp/server.rs` around lines 166 - 244, In handle_list, the
EntityType::Snippet branch currently ignores pagination; add a fail-fast check
that returns an McpError::invalid_params when pagination or unbounded requests
are attempted: if params.page.is_some() || params.count.is_some() ||
params.with_lines.unwrap_or(false) then return McpError::invalid_params("snippet
listing does not support pagination or with_lines=true yet", None). Keep the
existing get_snippet_locations call only for the safe default case (no explicit
page/count and with_lines=false) so we don't return an unbounded result.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/models/snippet.rs`:
- Around line 817-853: Refactor the serial loop in get_snippet_locations to use
a bounded concurrent fan-out: create async tasks per snippet (calling
get_snippet_details) and for each task, concurrently process its details.matches
(calling get_snippet_match only when with_lines) using
futures::stream::iter(...).map(...).buffer_unordered(CONCURRENCY) (or
FuturesUnordered) with a configurable CONCURRENCY constant to limit parallelism;
collect results into SnippetLocation structs (preserving error handling/logging
for get_snippet_match and detected_line_range) and push them into the final
locations Vec (use a thread-safe collector like Mutex<Vec<_>> or gather into a
Vec of Results then flatten) so behavior of get_snippet_details,
get_snippet_match, details.matches, detected_line_range and SnippetLocation is
preserved while avoiding unbounded concurrency.
- Around line 536-545: The highlighted_range function currently filters out all
blank highlighted lines, shifting the reported start when only trailing
highlighted blanks should be ignored; update highlighted_range to consider
highlighted CodeLine entries (using CodeLine.is_highlighted, .line,
.line_number) without dropping internal or leading blank highlighted lines, then
trim only trailing blank highlighted lines: find the last highlighted line whose
.line.trim() is non-empty (or return None if none), and compute the min/max
line_number across highlighted lines up to that last non-empty highlighted index
to produce the (lo, hi) range.

In `@tests/fixtures/snippets/snippet-details.json`:
- Around line 16-17: Replace the real email value in the JSON fixture for the
"rejectedBy" field with an anonymized placeholder (e.g. "redacted@example.com"
or "<REDACTED_EMAIL>") and apply the same replacement consistently for the other
fixture occurrence referenced (lines 35-36) so no direct identifiers remain in
snippet-details.json; update any tests/assertions that expect the original
literal to use the new placeholder.

In `@tests/snippet.rs`:
- Around line 50-55: The current mock for snippet listing (Mock::given +
method("GET") + path(format!("/revisions/{ENC_REV}/snippets")) ...) is too
permissive; change the list mock to assert the exact contract path including the
trailing slash (e.g. path(format!("/revisions/{ENC_REV}/snippets/"))) and keep
expect(1) for the single list call; additionally add separate mocks for each
snippet detail endpoint (Mock::given + method("GET") +
path(format!("/revisions/{ENC_REV}/snippets/{SNIPPET_ID}")) responding with
ResponseTemplate::new(200).set_body_json(...)) and set expect to the exact
number of detail calls your code should make (based on SNIPPETS_JSON), so both
list and per-snippet fanout counts are validated.

---

Outside diff comments:
In `@src/mcp/server.rs`:
- Around line 166-244: In handle_list, the EntityType::Snippet branch currently
ignores pagination; add a fail-fast check that returns an
McpError::invalid_params when pagination or unbounded requests are attempted: if
params.page.is_some() || params.count.is_some() ||
params.with_lines.unwrap_or(false) then return McpError::invalid_params("snippet
listing does not support pagination or with_lines=true yet", None). Keep the
existing get_snippet_locations call only for the safe default case (no explicit
page/count and with_lines=false) so we don't return an unbounded result.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c8cb5e03-77e8-46c4-8eeb-586e92e92a4b

📥 Commits

Reviewing files that changed from the base of the PR and between f9f3db3 and 244c720.

📒 Files selected for processing (17)

.nudge.yaml
src/bin/fossapi.rs
src/cli/mod.rs
src/lib.rs
src/mcp/params.rs
src/mcp/server.rs
src/models/mod.rs
src/models/snippet.rs
src/output.rs
tests/fixtures/snippets/SCHEMA.md
tests/fixtures/snippets/match-details-partial.json
tests/fixtures/snippets/match-details-small.json
tests/fixtures/snippets/paths-sources.json
tests/fixtures/snippets/paths.json
tests/fixtures/snippets/snippet-details.json
tests/fixtures/snippets/snippets.json
tests/snippet.rs

coderabbitai · 2026-06-11T18:50:36Z

+    fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> {
+        let nums = lines
+            .iter()
+            .filter(|l| l.is_highlighted && !l.line.trim().is_empty())
+            .map(|l| l.line_number)
+            .collect::<Vec<_>>();
+        match (nums.iter().min(), nums.iter().max()) {
+            (Some(&lo), Some(&hi)) => Some((lo, hi)),
+            _ => None,
+        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

highlighted_range currently removes non-trailing blank highlighted lines, which can misreport start lines.

The filter l.is_highlighted && !l.line.trim().is_empty() drops all blank highlighted lines. That breaks the “ignore trailing blank EOF lines” behavior and can shift the reported start line forward.

Suggested fix

fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> { - let nums = lines + let highlighted = lines .iter() - .filter(|l| l.is_highlighted && !l.line.trim().is_empty()) - .map(|l| l.line_number) + .filter(|l| l.is_highlighted) .collect::<Vec<_>>(); - match (nums.iter().min(), nums.iter().max()) { - (Some(&lo), Some(&hi)) => Some((lo, hi)), - _ => None, - } + let first = highlighted.first().map(|l| l.line_number)?; + let last = highlighted + .iter() + .rev() + .find(|l| !l.line.trim().is_empty()) + .map(|l| l.line_number)?; + Some((first, last)) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> {

let nums = lines

.iter()

.filter(|l| l.is_highlighted && !l.line.trim().is_empty())

.map(|l| l.line_number)

.collect::<Vec<_>>();

match (nums.iter().min(), nums.iter().max()) {

(Some(&lo), Some(&hi)) => Some((lo, hi)),

_ => None,

}

fn highlighted_range(lines: &[CodeLine]) -> Option<(u32, u32)> {

let highlighted = lines

.iter()

.filter(|l| l.is_highlighted)

.collect::<Vec<_>>();

let first = highlighted.first().map(|l| l.line_number)?;

let last = highlighted

.iter()

.rev()

.find(|l| !l.line.trim().is_empty())

.map(|l| l.line_number)?;

Some((first, last))

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/snippet.rs` around lines 536 - 545, The highlighted_range function currently filters out all blank highlighted lines, shifting the reported start when only trailing highlighted blanks should be ignored; update highlighted_range to consider highlighted CodeLine entries (using CodeLine.is_highlighted, .line, .line_number) without dropping internal or leading blank highlighted lines, then trim only trailing blank highlighted lines: find the last highlighted line whose .line.trim() is non-empty (or return None if none), and compute the min/max line_number across highlighted lines up to that last non-empty highlighted index to produce the (lo, hi) range.

coderabbitai · 2026-06-11T18:50:37Z

+    let mut locations = Vec::new();
+    for snippet in &snippets {
+        let details = get_snippet_details(client, revision_locator, &snippet.id).await?;
+        let licenses = details.license_ids();
+        for m in &details.matches {
+            let (line_start, line_end) = if with_lines {
+                match get_snippet_match(client, revision_locator, &snippet.id, &m.path).await {
+                    Ok(md) => match md.detected_line_range() {
+                        Some((lo, hi)) => (Some(lo), Some(hi)),
+                        None => (None, None),
+                    },
+                    Err(e) => {
+                        tracing::warn!(
+                            "Failed to fetch match details for {} at {}: {e}",
+                            snippet.id,
+                            m.path
+                        );
+                        (None, None)
+                    }
+                }
+            } else {
+                (None, None)
+            };
+
+            locations.push(SnippetLocation {
+                path: m.path.clone(),
+                snippet_id: snippet.id.clone(),
+                package: details.package.clone(),
+                version: details.version.clone(),
+                purl: details.purl.clone(),
+                match_percentage: m.match_percentage,
+                licenses: licenses.clone(),
+                line_start,
+                line_end,
+            });
+        }
+    }


🧹 Nitpick | 🔵 Trivial | 🏗️ Heavy lift

Consider bounded concurrency in get_snippet_locations to avoid high latency on large revisions.

This loop performs details and optional match drill-ins serially. For large N/M, wall-clock time scales with total request count and can become operationally expensive. A bounded concurrent fan-out (with backpressure) would materially improve throughput while controlling rate-limit pressure.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/snippet.rs` around lines 817 - 853, Refactor the serial loop in get_snippet_locations to use a bounded concurrent fan-out: create async tasks per snippet (calling get_snippet_details) and for each task, concurrently process its details.matches (calling get_snippet_match only when with_lines) using futures::stream::iter(...).map(...).buffer_unordered(CONCURRENCY) (or FuturesUnordered) with a configurable CONCURRENCY constant to limit parallelism; collect results into SnippetLocation structs (preserving error handling/logging for get_snippet_match and detected_line_range) and push them into the final locations Vec (use a thread-safe collector like Mutex<Vec<_>> or gather into a Vec of Results then flatten) so behavior of get_snippet_details, get_snippet_match, details.matches, detected_line_range and SnippetLocation is preserved while avoiding unbounded concurrency.

coderabbitai · 2026-06-11T18:50:37Z

+          "rejectedBy": "sara@fossa.com"
+        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Redact email identifiers in captured fixtures.

This fixture embeds a real-looking email address in rejectedBy. Storing direct identifiers in test artifacts creates unnecessary privacy/compliance exposure. Please replace with anonymized placeholders consistently across all snippet fixtures.

Also applies to: 35-36

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/fixtures/snippets/snippet-details.json` around lines 16 - 17, Replace the real email value in the JSON fixture for the "rejectedBy" field with an anonymized placeholder (e.g. "redacted@example.com" or "<REDACTED_EMAIL>") and apply the same replacement consistently for the other fixture occurrence referenced (lines 35-36) so no direct identifiers remain in snippet-details.json; update any tests/assertions that expect the original literal to use the new placeholder.

coderabbitai · 2026-06-11T18:50:37Z

+    Mock::given(method("GET"))
+        .and(path(format!("/revisions/{ENC_REV}/snippets")))
+        .respond_with(ResponseTemplate::new(200).set_body_json(body(SNIPPETS_JSON)))
+        .expect(1)
+        .mount(&mock_server)
+        .await;


🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Tighten mock expectations to catch snippet-query and fanout regressions.

These mocks currently don’t assert key contract details (path=/ on snippet list, expected number of details calls), so regressions can still pass.

✅ Suggested test hardening

Mock::given(method("GET")) .and(path(format!("/revisions/{ENC_REV}/snippets"))) + .and(query_param("path", "/")) .respond_with(ResponseTemplate::new(200).set_body_json(body(SNIPPETS_JSON))) .expect(1) .mount(&mock_server) .await; @@ Mock::given(method("GET")) .and(path(format!("/revisions/{ENC_REV}/snippets"))) + .and(query_param("path", "/")) .respond_with(ResponseTemplate::new(200).set_body_json(body(SNIPPETS_JSON))) + .expect(1) .mount(&mock_server) .await; @@ Mock::given(method("GET")) .and(path_regex(r"/snippets/[0-9]+$")) .respond_with(ResponseTemplate::new(200).set_body_json(body(DETAILS_JSON))) + .expect(3) .mount(&mock_server) .await;

Also applies to: 116-127

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/snippet.rs` around lines 50 - 55, The current mock for snippet listing (Mock::given + method("GET") + path(format!("/revisions/{ENC_REV}/snippets")) ...) is too permissive; change the list mock to assert the exact contract path including the trailing slash (e.g. path(format!("/revisions/{ENC_REV}/snippets/"))) and keep expect(1) for the single list call; additionally add separate mocks for each snippet detail endpoint (Mock::given + method("GET") + path(format!("/revisions/{ENC_REV}/snippets/{SNIPPET_ID}")) responding with ResponseTemplate::new(200).set_body_json(...)) and set expect to the exact number of detail calls your code should make (based on SNIPPETS_JSON), so both list and per-snippet fanout counts are validated.

) Snippets shipped to main in #30 but the docs, version, and a broken CI config were never updated. This gets main release-ready for v0.2.0. Docs: - README: Snippets CLI section (list snippets/snippet-paths/ snippet-locations, get snippet/snippet-match), MCP tools table (snippet on `list`, the `snippet_match` tool, no get/update), and the Snippet locator - CLAUDE.md: ERD, architecture tree, the four snippet endpoints, the Snippet model entry with its API quirks, and Future Work Version: 0.1.0 -> 0.2.0 (Cargo.toml + Cargo.lock) CI runner fix: dist 0.27 defaults to ubuntu-20.04 and windows-2019, both retired by GitHub, so every Release-workflow job queued until the 24h timeout and the PR `plan` check sat "pending" indefinitely. Pin supported images via [workspace.metadata.dist.github-custom-runners] (ubuntu-22.04 / macos-13 / macos-14 / windows-2022) and regenerate release.yml. `dist generate --check` is clean; `dist plan` now assigns only hosted runners. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

saramaebee and others added 2 commits June 11, 2026 14:42

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

saramaebee merged commit 724844a into main Jun 12, 2026
1 check failed

saramaebee mentioned this pull request Jun 12, 2026

Prep 0.2.0 release: document snippets, bump version, fix CI runners #32

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add snippet support to fossapi#30

Add snippet support to fossapi#30
saramaebee merged 2 commits into
mainfrom
feature/snippets

saramaebee commented Jun 11, 2026

Uh oh!

coderabbitai Bot commented Jun 11, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Uh oh!

coderabbitai Bot Jun 11, 2026

Uh oh!

coderabbitai Bot Jun 11, 2026

Uh oh!

coderabbitai Bot Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

saramaebee commented Jun 11, 2026

Summary

Surface area

API quirks handled (captured live; see tests/fixtures/snippets/SCHEMA.md)

Test plan

Uh oh!

coderabbitai Bot commented Jun 11, 2026

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

API quirks handled (captured live; see `tests/fixtures/snippets/SCHEMA.md`)